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SUMMARY 

There  exists  a  set  of  problems  that  conventional  computer  science 
approaches  find  intractable  or  prohibitively  expensive.  This  set  includes 
image,  natural  language  and  speech  understanding,  along  with  expert 
reasoning.  Interestingly  enough,  these  problems  are  easily  solved  by 
humans  so  a  solution  must  exist.  The  computer  science  community  has  taken 
up  this  challenge  and  the  field  of  Artificial  Intelligence  (AI)  developed 
to  address  these  problems.  Unfortunately,  the  solution  strategies 
developed  by  the  AI  researches  so  far  possess  several  shortcomings.  In 
particular,  to  implement  the  solutions  for  meaningful  applications  requires 
the  development  of  computers  far  different  than  the  digital  electronic 
workhorses  of  the  past.  It  is  not  clear  whether  the  presently  available 
electronic  computer  technology  is  even  capable  of  building  machines  that 
can  solve  these  problems.  Therefore,  in  this  report  we  identified  ways  how 
an  alternative  technology,  optical  computing,  may  address  the  critical 
shortcomings  of  electronic  approaches. 

In  pursuit  of  this  goal,  the  first  section  of  this  report,  a  book 
chapter  and  a  paper  entitled  "Optics  and  Symbolic  Computing",  delineates 
the  focus  of  a  few  potentially  fruitful  research  efforts  for  optical 
computing  scientists.  The  goal  of  this  approach  is  to  review  traditional 
AI  problems  and  methods  and  identify  critical  points  where  the  use  of 
optical  techniques  may  allow  higher  computational  throughputs.  The  chapter 
begins  by  defining  symbolic  computing  with  regard  to  the  characteristics 
and  representations  of  knowledge.  Moreover,  this  section  highlights  the 
typical  operations  found  in  a  symbolic  computing  system  and  contrasts  their 
attributes  with  those  of  conventional  numeric  computing.  Next  the  authors 
review  the  strategies  for  solving  the  traditional  AI  problems  and  identify 
the  computational  bottlenecks  in  proposed  systems.  In  particular,  the 
potential  for  optics  to  address  critical  bottlenecks  is  identified.  The 
chapter  concludes  by  introducing  sequential  and  parallel  symbolic  computing 
archi tectures  and  their  possible  optical  implementations.  The  optical 
archi tectures  proposed  range  from  replacement  of  electrical  connections 
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with  optical  connections  to  all  optical  machines.  Avenues  for  future 
research  that  will  provide  the  most  benefits  are  the  following:  high-speed 
global  interconnects,  reconf igurable  interconnects,  high  fan-out  optical 
elements,  and  multiport  components  for  parallel  processing. 

The  AI  problem  most  suited  for  the  immediate  impact  of  optics  is  image 
understanding.  Therefore,  the  next  section  explores  the  algorithms  and 
implementations  of  image  understanding  by  computers.  The  significance  of 
image  understanding  to  military  systems  and  advanced  manufacturing  environ¬ 
ments  is  growing  because  of  the  need  for  remote  and  intelligent  sensors. 
Hence  the  report  cites  many  image  understanding  algorithms  that  may  have 
parallel  implementations  suitable  for  optical  systems.  The  role  for  optics 
is  identified  as  a  low-level,  front-end  preprocessor  for  a  hybrid  digital- 
optical  system.  The  impact  of  efficient  and  rich  low-level  operations  on 
higher  level  symbolic  processing  is  emphasized  as  the  main  reason  why  a 
modular  optical  preprocessing  system  with  high-level  control  is  needed. 
The  interface  and  feedback  control  between  the  high  and  low  level  opera¬ 
tions  are  identified  as  the  critical  components  for  the  construction  of  a 
workable  image  understanding  engine.  In  this  vein,  relaxation  is  proposed 
to  bridge  the  gap  between  all  processing  stages.  It  is  compatible  with  the 
modular  and  highly  parallel  processors  possible  with  optics  and  may  be 
useful  in  reconstruction  of  a  meaningful  representation  of  a  scene. 

Another  candidate  problem  for  optics  based  symbolic  computers  is  in 
the  area  of  adaptive  knowledge-based  systems.  In  a  paper  presented  at  FJCC 
entitled  "An  Organizational  Framework  for  Comparing  Adaptive  Artificial 
Intelligence  Systems",  the  inherent  limitations  of  traditional  expert 
systems  are  contrasted  with  the  advantages  of  adaptive  systems.  Various 
adaptive  systems  are  compared  with  respect  to  the  following  features: 
representation;  storage  and  recall  mechanisms;  control  strategy;  adaptation 
of  storage  and  control  structure.  From  this  analysis,  the  design  of  an 
ideal  adaptive  system  is  outlined.  Since  expert  systems  of  the  future  will 
probably  resemble  the  adaptive  systems  of  today,  optical  machines  for 
knowledge-based  systems  should  incorporate  some  of  the  features  presented 
in  this  paper. 
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A  critical  feature  of  knowledge-based  AI  systems  is  the  need  for  rapid 
recall  of  data  from  incomplete,  contradictory  and  noisy  information.  In 
this  regard,  ootical  associative  memories  that  possess  these  capabilities 
may  play  a  key  role  in  the  development  of  robust  AI  systems.  Therefore,  in 
this  section,  an  associative  memory  is  proposed  that  has  a  large  storage 
capacity  and  can  focus  its  attention  on  sDecific  information  in  response  to 
external  influences.  Attention  is  crucial  in  constraining  the  search 
procedure.  Relational  data  base  machines  are  described  and  the  need  for  a 
processor  that  can  associate  data  is  established.  The  associative  memory 
presented  in  this  section  can  be  cascaded  to  form  a  heteroassociative 
memory  or  with  the  addition  of  logic  units  perform  many  functions  reauired 
in  relational  algebra  machines. 

I n  add i t ion  to  optical  rel at ional  algebra  machines,  optical  matr i x 
algebra  processors  may  have  a  tremendous  import  to  the  realization  of 
ootical  symbolic  computers.  Specifically  matrix  algebra  is  useful  in  low 
level  image  processing  and  may  enable  the  development  of  inference  engines 
and  directed  grapn  processors.  in  light  of  their  significance,  optical 
matrix  algebra  processor  architectures  were  classified  according  to  the 
degree  of  oarallelism  employed  and  the  type  of  interconnections. 

In  conclusion,  optics  may  be  able  to  enhance  the  performance  of 
computers  applied  to  AI  problems.  Several  candidate  areas  where  the 
benefits  of  optics  would  increase  the  caoabilities  of  prooosed 
architectural  implementations  were  identified.  Where  aporooriate,  specific 
optical  archi tectures  were  orooosed.  This  semi-annual  technical  reoort  is 
a  compilation  of  technical  reports,  briefings,  and  papers  delivered  under 
the  terms  of  the  contract.  Future  research  will  focus  on  the  role  of 
optics  in  ubiquitous  symbolic  operations  like  compare-and-exchange  and 
pattern  matching.  In  addition,  new  computational  models  that  may  be  better 
suited  for  describing  symbolic  computation  will  be  considered. 
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The  incorporation  of  intelligence  into  computational  systems  is 
rapidly  gaining  momentum  with  both  the  computer  science  community  and  with 
a  large  segment  of  computer  system  users.  The  field  has  been  traditionally 
labeled  "artificial  intelligence,"  or  "AI"  for  short.  It  is  difficult  to 
exactly  specify  what  is  meant  by  AI,  however,  since  intelligence  is  a  rela¬ 
tive  merit  which  cannot  be  precisely  quantified  or  defined.  Some  aspects 
of  intelligence  have  been  in  computers  from  the  beginning,  even  though  AI 
conjures  up  an  image  of  very  advanced  computer  science.  Afterall,  memory 
capabilities  are  usually  associated  with  intelligence  and  even  the 
earliest  computers  incorporated  some  form  of  memory.  A  working  under¬ 
standing  of  what  is  meant  by  AI  might  succinctly  be  stated  as  imparting  to 
computers  those  attributes  which  we  associate  with  the  human  thought 
processes  that  are  notably  different  from  the  way  in  which  conventional 
computers  operate. 

An  important  characteristic  of  AI  systems  is  their  incorporation  and 
utilization  of  knowledge  in  each  computation.  Computer  scientists  have 
been  struggling  for  the  last  two  decades  to  determine  how  best  to  represent 
knowledge  in  computing  machines.  Numerous  techniques  have  evolved  for 
creating,  manipulating,  and  storing  collections  of  symbolic  structures. 
These  structures,  or  knowledge  elements,  can  be  used  to  represent  objects, 
events,  knowledge  about  how  to  do  things,  and  knowledge  related  to  what  is 
known  (or  meta-knowledge).  Collectively,  this  set  of  symbolic  structures 
is  referred  to  as  the  knowledge  base  of  the  AI  system. 

Why  does  optical  processing  look  attractive  for  performing  symbolic 
manipulations  or  computing?  This  chapter  is  designed  to  answer  this 
question  in  some  detail,  but  on  the  surface  one  could  readily  cite  computa¬ 
tional  throughput  and  operation  compatibility.  The  need  for  computational 
throughput  can  be  appreciated  by  comparing  the  throughput  performance  of 
numeric  computers  against  that  of  dedicated  LISP  machines  (LISP,  or  L I  St 
Processing,  being  the  symbolic  computing  language  of  choice  in  the  US  AI 
community).  Figure  1  is  representative  of  system  performances,  depicting  a 
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Trends  in  Numeric  and  Symbolic  Computation  Technology 
(Source  :  IEEE) 
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typical  measure  of  AI  computing  power  on  the  vertical  axis  and  operational 
speed  on  the  horizontal  axis.*»2  LIPS,  or  Logical  Inferences  Per  Second, 
is  used  here  as  a  measure  of  intelligence,  since  no  functional  equivalent 
of  an  "IQ  test"  currently  exists  for  AI  systems.  The  figure  illustrates 
the  slow  speed  of  these  LISP  machines,  relative  to  current  generation 
"supercomputers"  and  next  generation  mul tiprocessors .  The  need  to 
drastically  increase  the  processing  rate  of  AI  systems,  coupled  with  the 
limitations  of  current  uniprocessor  architectures ,  has  resulted  in  a  major 
research  impetus  to  explore  parallelism  to  enhance  the  speed  of  these 
machines  and  render  them  far  more  useful  as  modern  computing  systems.  This 
utilization  of  parallelism  indicates  a  potential  synergism  between  symbolic 
and  optical  processing. 

Interestingly  enough,  the  numeric  "supercomputers"  execute  AI 
functions  at  a  greater  rate  than  dedicated  AI  machines.  This  may  imply 
that  raw  speed  is  an  important  component  in  overcoming  existing  AI  computa¬ 
tional  bottlenecks.  It  also  may  indicate  that  archi tectures  designed  to 
improve  numeric  throughput  will  be  useful  in  symbolic  computation,  and  vice 
versa.  The  issue  of  mapping  the  computational  structure  onto  desired 
functionality  is  currently  a  global  issue  of  concern,  and  crosses  all 
boundaries  of  computer  science  and  mathematics. 

The  second  factor  potentially  linking  optical  and  symbolic  pro¬ 
cessing,  that  of  operation  compatibility,  derives  from  the  need  to  perform 
correlation,  searching,  and  matching  types  of  operations  on  symbolic  data. 
Many  of  these  operations  do  not  require  high  computational  accuracy.  The 
application  of  optics  to  symbolic  computing  will  likely  avoid  what  has 
traditionally  been  the  "Achilles  heel"  of  optical  computing  -  the 
difficulty  in  achieving  more  than  a  few  bits  of  accuracy.  In  addition,  the 
dependence  of  symbolic  computing  on  correlation  functions  (and  their 
isomorphs)  may  provide  an  excellent  opportunity  to  enhance  symbolic 
computing  performance  with  the  use  of  optical  correlators. 

The  next  section  will  describe  the  fundamental  attributes  of  symbolic 
computing,  and  will  compare  it  with  techniques  commonly  utilized  in  numeric 
computing.  Following  this  introductory  discussion,  we  will  present  a 
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description  of  the  most  important  functional  capabilities  that  are  based  on 
symbolic  computing.  This  will  lead  to  a  discussion  of  problem  areas  that 
are  likely  to  be  faced  in  achieving  these  capabilities.  These  sections  do 
not  address  optical  symbolic  computing,  but  the  reader  will  likely  find  the 
information  contained  therein  to  be  important  to  gaining  an  understanding 
of  the  synergism  between  optical  and  symbolic  forms  of  computing.  In  fact, 
these  sections  are  intended  as  an  introduction  to  the  subject  of  artificial 
intelligence,  and,  because  of  the  breadth  of  the  topic,  we  have  had  to 
limit  our  discussions  to  only  the  major  aspects.  Section  IV  will  then  lay 
the  framework  for  symbolic  computing  with  optics.  Fundamental 
architectural  concepts  will  be  described  with  an  emphasis  on  how  they 
differ  from  the  more  conventional  computer  archi tectures .  This  will  be 
followed  by  the  authors'  concepts  of  how  optics  could  enhance  the  perform¬ 
ance  of  symbolic  processors. 

In  trying  to  cover  such  a  broad  topic,  we  have  attempted  to  provide  an 
introduction  to  each  of  the  main  disciplines  within  symbolic  computing. 
However,  in  many  cases  we  have  had  to  sacrifice  the  details  of  a  particular 
topic  in  order  to  provide  a  balanced  presentation.  In  other  cases,  the 
material  is  simply  beyond  the  scope  of  this  book.  For  additional  informa¬ 
tion,  the  Interested  reader  can  consult  literature  such  as  The  Handbook  of 
Artificial  Intelligence,^  a  three  volume  treatise  which  provides  an 
excellent  overview  of  AI  technology. 
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CHAPTER  II 

WHAT  IS  SYMBOLIC  COMPUTING? 

In  order  to  realize  computers  that  are  more  capable  of  simulating 
human  thought  processes  than  is  possible  with  today's  numeric  computers, 
the  bit  patterns  within  the  computers  must  be  made  to  represent  arbitrary 
symbols  in  addition  to  arithmetic  ones.  For  example,  the  computer  should 
be  able  to  provide  an  answer  to  the  question  "What  path  should  I  take  to 
reach  my  destination  (given  all  I  know  about  my  environment  and  my  capa¬ 
bilities)?"  as  well  as  to  an  arithmetic  question  such  as  "What  is  the  sum 
of  2  plus  4?"  Humans  routinely  make  routing  decisions,  but  numeric  com¬ 
putation  was  not  developed  to  handle  such  problems.  First  of  all,  encoding 
the  question  itself  into  the  computer  offers  a  formidable  problem.  Second, 
if  the  destination  and  alternative  path  information  is  provided  by  a  visual 
scene  ( i . e . ,  visual  navigation),  the  resulting  object  recognition  and  image 
understanding  tasks  can  be  immeasurably  enhanced  by  symbolic  manipulation. 
Finally,  the  decision  process  will  likely  draw  on  one's  knowledge  of  the 
world,  and  this  also  relies  on  symbolic  constructs. 

A.  KNOWLEDGE  CHARACTERISTICS 

A  system  that  we  would  describe  as  knowledgeable  has  three  main 
attributes:  capability  of  acquiring  additional  knowledge,  ability  to 

retrieve  appropriate  information  from  a  knowledge  base,  and  the  power  of 
reasoning  with  the  retrieved  information  to  solve  problems.  In  defining 
knowledge,  we  recognize  that  several  other  interpretations  are  possible, 
each  with  an  independent  set  of  characteri sties .  We  believe  that  our  set 
of  attributes  comprise  a  convenient  definition  for  knowledgeable  systems 
and  one  which  will  be  used  to  provide  a  format  for  the  following  discus¬ 
sion;  however,  it  suffers  a  malady  common  to  any  attempt  to  neatly  dissect 
and  categorize  a  complex  phenomena  -  that  of  an  inability  to  ascertain  the 
independence  of  the  attributes.  This  will  become  evident  in  the  discussion 
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of  acquisition  in  which  reasoning  (the  third  attribute)  is  described  as  a 
method  of  acquisition  (first  attribute)  -  this  is  what  we  know  as  learning. 

Acquisition  of  knowledge  can  occur  via  three  different  routes  as  shown 
in  Figure  2.  First  of  all,  knowledge  can  be  placed  into  the  computer  by  a 
programmer  working  in  conjunction  with  a  knowledge  engineer.  In  the  case 
of  expert  systems  (to  be  discussed  in  Section  III.E),  the  knowledge 
engineer  acquires  or  extracts  knowledge  from  an  expert,  and  passes  it  onto 
the  programmer,  who  embeds  it  within  the  computer.  This  process  is 
referred  to  as  knowledge  engineering.  Secondly,  in  the  near  future  we 

expect  more  and  more  knowledge  to  be  automatically  acquired  via  sensors. 
This  will  become  ar.  increasingly  popular  technique  with  the  advent  of 
speech  recognition  and  vision  systems  (to  be  discussed  in  Sections  III.B 
and  III.C).  Both  programmed  and  sensed  acquisition  may  be  accomplished  by 
just  rote  techniques,  but  most  often  the  information  is  classified  iri  some 
way  to  facilitate  the  retrieval  process.  Classification  is  a  process  of 
associating  each  input  with  related  items  to  form  classes  which  aid 
significantly  in  identifying  relevant  data  during  knowledge  base  searches. 
Classification  is  also  commonly  known  as  linking  and  lumping.  If  one 
knows,  upon  acquiring  a  given  piece  of  information,  that  it  will  be 
associated  with  an  entity  already  in  the  knowledge  base,  then  a  link  is 

specified  between  the  two,  and  if  many  entities  are  likely  to  be  used 
together,  they  are  lumped  into  a  larger  structure.  The  reader  should  have 
a  good  appreciation  and  understanding  of  classification  following  the  dis¬ 
cussions  on  retrieval,  knowledge  representation ,  and  search. 

The  third  category  of  acquisition  is  learned  knowledge.  The  field  of 
machine  learning  Is  still  in  Its  infancy,  but  three  areas  that  have  shown 

recent  progress  are  parameter  adjustment,  discovery,  and  analogical 

reasoning  (ref.  18).  The  variation  of  parameters  and  stimuli  is  a  standard 
scientific  technique  for  learning.  Two  important  areas  of  application  for 
AI  are  in  variation  of  classification  parameters  for  knowledge  acquisition 
(changing  of  classes  into  which  objects  are  placed),  and  in  adjusting 
heuristic  function  parameters  for  improved  problem  solving  (to  be  discussed 
in  Section  II. C  on  Search).  Discovery  usually  entails  problem  solving,  and 
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Figure  2.  Commonly  Used  Methods  of  Acquiring  Information 
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therefore  relies  heavily  on  reasoning.  Reasoning  also  underlies  learning- 
by-analogy  which  involves  filling  in  missing  information  about  some  entity 
if  it  is  known  that  the  partially  defined  entity  is  "like"  some  already 
known  entity.  The  representation  of  knowledge  via  frames  and  scripts 
(discussed  in  Section  II. C)  facilitates  analogical  reasoning  since  one  can 
readily  pass  attributes  between  two  frames  or  scripts  that  are  said  to  be 
alike. 

Retrieval  is  a  very  real  problem  for  AI  systems  for  several  reasons, 
not  the  least  of  which  is  due  to  the  large  sizes  of  the  data  bases.  Not 
only  do  these  data  bases  contain  the  collection  of  facts  that  are  relevant 
to  the  problem  domain  of  a  particular  system,  but  they  contain  rules  that 
enable  the  intelligent  manipulation  of  the  facts.  One  common  technique  of 
retrieval  utilizes  multiple  indices  which  tag  facts  by  one  or  more  of  their 
attributes.  For  example,  a  holographic  lens  could  be  indexed  by  such 
attributes  as  "optical",  "diffraction  grating",  "conformal",  "narrowband", 
and  "lightweight".  The  LISP  programming  language,  which  is  the  most  widely 
used  Tanguage  in  the  AI  community,  facilitates  the  assignment  of  attributes 
to  objects  in  that  it  centers  around  lists  of  related  symbols.  The  above 
example  could  be  encoded  in  LISP  using  a  "property  list"  as: 

(holographiclens  optical  diffractiongrating  conformal 
narrowband  lightweight). 

Two  other  retrieval  schemes  are  based  on  pattern  matching  and  con¬ 
texts.  The  pattern  matching  scheme  retrieves  data  according  to  some 
pattern  which  is  related  to  data  categories.  A  more  advanced  scheme  is 
contextual  storage,  in  which  data  are  retrieved  according  to  meanings.  As 
an  example  of  pattern  matching,  consider  a  data  base  containing  the 
following  1 ists: 

(lightsource  laser  heliumneon  wavel ength(x)  .  .  .) 

(lightsource  laser  NdYag  .  .  .  ) 

(lightsource  laser  diode  .  .  .  ) 

(lightsource  arc! amp  mercury  .  .  .  ) 
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(lightsource  arclamp  xenon  .  .  .  ) 
(powersource  llOVinput  12Voutput  .  .  .  ) 


In  order  to  retrieve  those  elements  representing  light  sources,  one  could 
query  the  data  base  with  a  pattern  denoted  as: 

(lightsource  ?x) 


‘V  To  find  laser  entries,  the  pattern  matcher  would  be  specified  by: 

v  (lightsource  laser  ?x) 


V 
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lA 


v 
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All  of  these  retrieval  schemes  differ  significantly  from  those  used  in 
numeric  computers  which  store  data  according  to  memory  addresses.  Numeric 
information,  being  a  subset  of  symbolic  information,  can  be  stored  and 
retrieved  via  the  schemes  mentioned  above,  although  likely  not  as 
efficiently.  For  example,  the  simple  list  (+249)  associates  the  numeric 
symbols  2,  4,  and  9  with  the  summation  operation,  and  the  nested  lists 
(+  (*  34)  (*  6  3))  associates  3  and  4  with  one  product  operation,  6  and  3 
with  the  other  product  operation,  and  associates  12  and  18  with  the  summa¬ 
tion  operation.  This  example  has  used  basic  LISP  notation  In  which  the 
first  element  of  the  list  represents  the  operation  to  be  performed  while 
the  remaining  elements  are  the  arguments  to  be  operated  upon. 

In  any  discussion  of  retrieval ,  one  must  go  far  beyond  the  fundamental 
techniques  for  recognizing  relevant  data  in  the  knowledge  base  to  a  discus¬ 
sion  of  how  one  searches  for  the  set  or  sets  of  data  that  can  lead  to  a 
defined  goal.  However,  a  discussion  of  search  will  be  delayed  until  after 
knowledge  representation  is  discussed  since  the  two  are  closely  related; 
i.e.,  knowledge  is  usually  represented  in  such  a  way  as  to  facilitate  the 
search  process. 

Reasoning,  which  is  the  third  capability  of  knowledge  systems,  is 
required  when  the  system  needs  information  that  cannot  be  retrieved 
directly  from  the  knowledge  base.  AI  systems  can  tradeoff  between  large 
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knowledge  bases  and  complex  reasoning  procedures.  Systems  must  either  have 
a  high  degree  of  reasoning  power  or  be  able  to  store  and  retrieve  all 
relevant  information;  however,  a  system  that  spends  too  much  time  searching 
for  reusoning  strategies  does  not  have  enough  knowledge. 

Reasoning  may  be  viewed  in  terms  of  a  movement  in  a  state  space  in 
which  the  states  represent  all  possible  situations  and  the  movement  is  from 
an  initial  state(s).  representing  the  current  situation(s ) ,  to  a  goal 
state(s).  The  reasoning  process  in  solving  practical  problems  typically 
involves  passing  through  many  intermediate  states.  The  allowable  transi¬ 
tions  between  the  states  are  specified  either  by  rules,  such  as  "if..., 
then..."  statements,  or  via  a  linking  of  facts,  such  as  a  directed  graph. 
The  discussion  in  the  next  section  on  knowledge  representation  will  present 
various  techniques  used  for  state  specifications  and  interstate  transi¬ 
tions.  A  classic  example  of  the  state  space  concept  is  the  game  of  chess, 
in  which  the  initial  state  would  be  the  starting  locations  of  all  pieces, 
and  the  goal  states  would  be  any  configuration  of  the  pieces  in  which  all 
possible  moves  by  the  opponent  are  illegal  and  the  opponent's  king  is  in 
check.  The  state  transitions  would  be  the  rules  governing  the  allowable 
moves  of  each  piece  in  each  state  (each  state  would  have  different  allow¬ 
able  moves  depending  on  the  locations  of  the  other  pieces  and  the  proximity 
of  pieces  to  the  edge  of  the  board). 

Residence  in  a  given  state  will  most  likely  present  the  problem  solver 
with  a  multitude  of  possible  transitions  to  other  states  enroute  to  a  solu¬ 
tion.  Therefore,  search  operations  also  play  a  major  role  in  the  reasoning 
process.  Here,  the  search  is  for  the  path  or  paths  through  the  state  space 
of  a  given  problem  domain,  whereas  in  retrieval  it  was  a  search  for 
relevant  data  in  the  knowledge  base.  As  stated  above,  search  operations 
will  be  discussed  following  the  next  section  on  knowledge  representation. 

B.  knowledge  REPRESENTATION 


Fundamental  to  each  of  the  types  of  knowledge  discussed  above  is  the 
problem  of  how  to  represent  them  in  a  computer  in  such  a  way  as  to 
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facilitate  their  interaction,  thus  producing  useful  systems.  Numerous 
representations  have  evolved  but  most  are  variations  or  combinations  of  the 
following  four:  semantic  nets,  production  systems,  frames,  and  logic 
systems.  Semantic  networks  are  very  divert  in  nature  but  are  generally 
characterized  as  being  graphical  representation  schemes  in  which  the  graph 
nodes  represent  objects  or  concepts  and  the  links  represent  inference 
procedures  that  relate  the  nodes.  Figure  3  illustrates  a  very  simplistic 
semantic  network  that  could  facilitate  arriving  at  the  inference  that  a 
Bragg  cell  has  both  transverse  and  longitudinal  wave  propagation.  This 
type  of  knowledge  is  often  referred  to  as  declarative,  since  it  is  usually 
derived  from  factual  statements  of  specific  knowledge  or  relationships. 

In  their  most  elementary  form,  production  systems  represent  knowledge 
by  rules  (productions)  formulated  as  "pattern/action"  pairs  expressed  as 
"If/then"  statements.  If  the  "pattern"  segment  of  the  statement  is  true 
(also  known  as  the  antecedent),  then  the  action  segment  is  "fired";  i.e., 
the  state  of  the  machine  is  modified  according  to  the  action  specified. 
Examples  of  such  statements  would  be:  "If  the  light  source  is  a  laser, 
then  it  emits  coherent  illumination,"  and  "If  low  cost  is  important  and  if 
coherency  is  not  required,  then  use  LEDs  instead  of  laser  diodes."  Produc¬ 
tion  systems  are  popular  as  knowledge  representation  schemes  for  expert 
systems  (to  be  covered  in  Section  II. E),  and  are  in  many  cases  referred  to 
as  procedural  knowledge,  since  some  action  typically  results  from  a  rule 
firing.  It  should  be  noted  that  pattern  matching  (correlation)  plays  an 
important  role  in  production  systems  in  that  rule  firings  are  based  on 
"matches"  between  the  rule  antecedents  (the  "if"  parts)  and  the  problem 
states;  that  is,  the  matching  process  determines  whether  the  antecedent  is 
true  or  false. 

Representation  via  frames  involves  organizing  data  by  functional 
groups  of  hierarchically  linked  attribute-value  pairs.  Such  a  representa¬ 
tion  is  advantageous  when  dealing  with  stereotypical  concepts  such  as 
illustrated  in  Figure  4.  This  frame,  representing  knowledge  of  a  two- 
dimensional  light  modulator,  might  have  been  referenced  by  one  of  several 
other  frames  such  as  ones  for  optical  processing,  input/output  devices,  or 
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gure  3.  Simplistic  Example  of  Semantic  Network 
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In  predicate  calculus,  it  is  possible  to  combine  predicates  together 
with  their  arguments  (rather  than  propositions  such  as  "it  is  illuminated") 
using  th?  connectives  of  propositional  calculus  including  AND  (/\),  OR 
(V),  NOT  (~) ,  EQUIVALENCE  (  =  )»  and  IMPLIES  (->or  ).  In  addition,  one 
must  introduce  quantifiers  that  assign  ranges  to  variables.  For  example, 
the  statement  EQUALS  (x,y)  could  mean  any  one  of  the  following  four 
relationships:  all  x  equals  all  y,  a  given  x  equals  a  given  y,  all  x 
equals  some  value  of  y,  or  some  value  of  x  is  equal  to  any  value  of  y.  The 
universal  quantifier  V  means  that  all  values  of  the  variable  x  are  to  be 
considered  while  the  existential  quantifier  x  means  that  some  particular 
value  of  x  is  to  be  considered.  For  example,  if  the  truth  of  EQUALS  (x,y) 
is  to  be  based  on  all  values  of  x  being  equal  to  all  values  of  y,  then  one 
would  express  it  as  V  x  V  y  EQUALS  (x,y)  whereas  if  the  meaning  were  that 
the  statement  is  to  be  TRUE  if  a  specific  value  of  x  equals  a  specific 
value  of  y,  then  one  would  write  x  y  EQUALS  (x,y).  Examples  of  expressions 
written  in  predicate  calculus  notation  are: 

Light  is  coherent  if  it  comes  from  a  laser. 


Vx  (LASERLIGHT  (x)  -  COHERENTLIGHT  (x) ) 


Some  laser  illumination  Is  visible. 


£  x  (LASERLIGHT  (x)  A  V IS IBLELIGHT  (x) ) 

V  Before  introducing  the  final  fundamental  element  of  logic,  that  of 

functions,  let  us  review  the  previously  introduced  elements.  These  are 
summarizes  n  Figure  5.  Note  that  the  accepted  convention  is  to  express 

variables  in  lower  case  al  phanumerics  and  to  express  the  constants  and 
'  predicate  symbols  in  the  upper  case.  Functions  will  also  be  expressed  in 

the  lo’-'er  case. 

Predicates  are  somewhat  limiting  since  their  evaluations  return  only 
to  TRUE  or  FALSE  values.  For  example,  the  predicates  WOMAN  (WIFE),  WOMAN 
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Figure  5.  tlements  of  Predicate  Calculus 
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(FLORENCENIGHTENGALE),  and  PRESIDENT  (UNITEDSTATES,  GEORGEWASHINGTON)  would 
return  TRUE  provided  that  in  the  latter  example  PRESIDENT  (x,y)  was 
assigned  the  meaning  "y  is  president  of  x" .  WOMAN  (GEORGEWASHINGTON) 
would,  of  course,  return  FALSE.  Functions,  on  the  other  hand,  can  return 
objects;  therefore,  they  are  used  as  arguments  of  predicates.  An  example 
would  be  "wavelength  (x)"  being  defined  to  retrieve  the  numeric  value  of 
the  wavelength  associated  with  x;  i.e.,  "wavelength  (REDLIGHT)"  would 
return  0.6  microns  and  "wavelength  (C02LASER)"  would  return  10.6  microns. 
An  example  of  a  function  as  part  of  a  predicate  would  be  VISIBLE  (wave¬ 
length  (C02LASER))  which  would  return  FALSE  since  the  C02  laser  lases  in 
the  near  infrared  region  of  the  spectrum  instead  of  the  visible  region. 

C.  SEARCH 


The  extensive  sizes  of  the  knowledge  bases  and  the  state  spaces 
required  for  solving  practical  symbolic  problems  dictate  the  use  of  some 
kind  of  control  strategy  for  searching  either  for  relevant  facts  In  the 
knowledge  base  or  for  solution  paths  through  the  problem  state  spaces. 
Figure  6  Illustrates  the  various  categories  of  search,  from  purely  random 
searching  to  very  domain  specific  heuristic  searching  (heuristic  Implying 
the  use  of  problem  domain  knowledge  to  guide  the  search).  Although  this 
simplified  block  diagram  fits  search  strategies  into  specific  categories, 
in  actuality  one  encounters  almost  a  continuum  of  strategies.  The  more 
random  that  a  search  is,  the  longer  the  time  needed  to  reach  the  goal.  On 
the  other  hand,  the  more  heuristic,  the  more  complex  are  the  control  pro¬ 
cedures.  In  the  limit,  the  most  complex  controls  would  Incorporate  so  much 
knowledge  about  the  search  space  that  the  need  to  search  would  be 
eliminated;  i.e.,  the  controls  would  guide  the  solution  directly  toward  the 
goal.  Since  both  extremes  are  unrealistic,  one  attempts  to  find  a 
compromise  strategy  that  is  best  for  the  problem  at  hand.  In  fact,  many  AI 
systems  use  a  combination  of  general  purpose  and  specific  purpose  schemes 
that  adapt  the  overall  search  toward  a  solution  can  adapt  to  a  varying 
structure  of  the  state  space  as  the  search  proceeds. 
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Figure  6.  Hierarchy  of  Search  Strategies 
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Underlying  the  techniques  of  search  are  several  general  strategies 
that  can  be  rendered  more  or  less  heuristic  depending  on  the  degree  to 
which  problem  domain  and  goal  information  are  used  in  selecting  various 
search  paths  to  be  tried.  Some  of  these  strategies  which  will  be  discussed 
are:  tree  versus  di rected-graph ,  depth-first  versus  breadth-first,  and 

forward  versus  backward  chaining. 

The  search  process  may  be  accomplished  by  following  either  a  directed- 
graph  structure,  such  as  illustrated  in  Figure  7a,  or  the  corresponding 
tree  structure  shown  in  Figure  7b.  The  directed-graph  search  remembers  all 
tried  strategies  so  that  the  search  can  return  to  an  earlier  problem  state 
(graph  node)  and  continue  the  search  along  another  path.  The  tree  search, 
on  the  other  hand,  may  duplicate  efforts  in  pursuing  strategies  that  were 
already  tried.  For  example,  in  the  push  toward  a  viable  solution,  the  "C" 
node  might  have  to  be  expanded  a  second  time  (let  us  say  via  A  C  F  ...)  if 
the  first  attempt  (via  A  B  E  C)  did  not  reach  a  goal.  The  obvious  dis¬ 
advantage  of  the  tree  is  the  possibility  for  redundant  effort,  but  the 
advantages  are  the  use  of  much  less  memory  in  not  having  to  store  the 
previous  strategies  and  the  freedom  from  running  a  test  on  every  generated 
node  to  determine  if  it  matches  a  previously  generated  node.  The  choice  of 
tree  versus  graph  is  often  decided  by  the  nature  of  the  problem  being 
solved;  i.e.,  how  frequently  are  repeated  states  likely  to  occur.  The 
availability  of  memory  will  also  be  a  deciding  factor. 

Search  schemes  may  also  differ  with  respect  to  the  order  in  which  the 
nodes  are  searched.  A  strictly  serial  search,  known  as  depth-first, 
pursues  a  given  strategy  (e.g.,  branch  of  a  tree)  until  the  strategy  has 
either  succeeded  in  reaching  a  goal  or  has  been  shown  to  terminate  in  an 
unsuccessful  search.  In  the  latter  case,  one  backtracks  to  the  most 
recently  traversed  node  that  possesses  a  yet  untried  branch.  This  search 
procedure  saves  on  memory  since  unsuccessful  branches  can  be  discarded; 
however,  If  the  problem  domain  is  characterized  by  long  search  paths,  the 
depth-first  search  could  be  exceedingly  costly  in  terms  of  time.  An 
example  of  a  depth-first  search  through  the  tree  illustrated  in  Figure  7b 
might  be  A  B  D  B  E  C  A  C  F  G  ... 
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In  the  breadth-fi rst  search,  on  the  other  hand,  all  states  linked  to 
the  starting  state  are  tested  to  see  if  they  match  the  goal  state  before 
proceeding  any  deeper  into  the  tree  or  graph.  If  a  goal  has  not  been 
reached,  then  all  of  these  first  level  states  (or  nodes)  are  expanded; 
i.e.,  all  of  the  linked  states  are  generated  and  tested.  For  example,  the 
breadth- first  search  through  the  tree  of  Figure  6b  would  proceed  via  the 
nodal  order  ABCDEFHC  ...  until  a  goal  state  is  reached.  This  type 
of  search  could  have  great  utility  in  parallel  processing  systems,  such  as 
the  types  that  will  be  discussed  in  Section  IV. 

Up  to  this  point,  problem  solution  and  reasoning  have  been  presented 
as  a  procession  from  a  starting  state  or  states  toward  the  goal  state(s). 
Another  option  that  we  as  humans  sometimes  employ  is  to  start  with  the  goal 
and  proceed  backwards  in  an  attempt  to  satisfy  the  initial  conditions.  In 
backward  reasoning,  or  backward  chaining  as  it  is  known  in  the  AI 
community,  one  first  generates  one  or  more  states  that  could  produce  the 
goal  state  and  tests  to  see  if  a  match  exists  with  the  initial  state(s). 
If  not,  the  search  continues  on  backwards.  In  production  systems,  this 
means  that  the  "then"  parts  of  the  rules  are  matched  and  the  "if"  parts  are 
fired  (i.e.,  each  "if"  part  is  used  to  generate  a  more  forward  node). 
There  are  two  factors  which  would  strongly  influence  the  choice  of  backward 
chaining  over  forward  chaining  -  the  branching  factor  going  backward  versus 
going  forward,  and  the  number  of  goal  states  versus  the  number  of  initial 
states.  That  is,  if  the  generated  search  tree  branches  out  significantly 
more  in  the  forward  search  than  for  the  backward  search  for  a  given  problem 
domain,  then  backward  chaining  would  be  preferable  for  such  problems.  If 
the  branching  is  approximately  the  same  in  both  directions,  the  number  of 
goal  states  versus  the  number  of  initial  states  becomes  a  deciding  factor. 
Backward  chaining  looks  more  appealing  in  solving  synthesis-type  problems 
for  which  there  exists  a  broad  spectrum  of  objects  from  which  to 
synthesize.  An  example  would  be  the  determination  of  which  material 
characteristics  are  needed  to  optimally  realize  a  specific  device.  Here  it 
would  be  better  to  ‘.tart  with  the  goal  state  (the  device  requi  rements )  than 
to  start  with  the  sets  of  characteristics  for  all  possible  materials. 
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Chess,  on  the  other  hand,  could  never  be  reasoned  through  via  backward 
chaining  due  to  the  extremely  large  number  of  ways  to  attain  checkmate. 

For  many  practical  problems,  the  state  spaces  are  so  large  that 
searches  for  guaranteed  optimum  solutions  are  sacrificed  in  favor  of 
incorporating  strategy  and  tactics  into  constraint  of  the  search  process 
and  be’ng  satisfied  with  a  good  solution  rather  than  necessarily  the  best. 
Search  processes  involved  with  the  playing  of  chess  are  an  excellent 
example  of  this;  otherwise,  the  chess  problem  could  not  realistically  be 
solved.  As  previously  mentioned,  varying  amounts  of  information  can  be 
provided  in  order  to  judiciously  guide  the  various  search  schemes  just  dis¬ 
cussed.  Techniques  such  as  indexing,  factorization,  and  template  matching 
fall  under  the  category  of  general  purpose  heuristics;  however,  their 
effect  in  constraining  complex  searches  is  characteri stically  weak,  often 
necessitating  the  use  of  more  complex  heuristics  in  conjunction  with  these 
more  general  purpose  ones.  Indexing  involves  using  a  predetermined  scheme 
to  assign  indices  to  problem  states  and  storing  the  rules  applicable  to 
each  problem  state  in  such  a  way  that  they  can  be  associated  with  the  index 
for  that  state.  Residence  in  a  particular  state  can  then  invoke  the  index 
for  that  state,  which,  In  turn,  can  call  up  all  of  the  rules  that  could 
apply.  More  appealing  is  the  operation  of  factorization.  If  the  knowledge 
base  is  divisible  into  broad  sets  which  have  small  cross-correl ations , 
entire  sections  can  be  ignored  by  considering  appropriate  problem  domain 
information.  For  example,  if  the  question  at  hand  deals  with  diagnostics 
for  lung  diseases,  the  system  does  not  need  to  search  any  part  of  its 
knowledge  base  dealing  with  procedures  for  use  in  actual  lung  operations. 
Search  constraint  can  also  be  achieved  via  template  matching,  analogous  to 
classical  pattern  matching  where  the  matching  is  used  to  verify  that  the 
correct  shape,  word,  etc.  has  been  found.  Template  matching  is  frequently 
used  in  speech  recognition,  natural  language  understanding,  and  image 
understanding,  all  of  which  will  be  discussed  in  Section  III. 

One  can  get  more  quantitative  in  search  constraint  by  utilizing  some 
kind  of  heuristic  (or  evaluation)  function.  This  falls  under  the  class  of 
specific  purpose  heuristics  as  denoted  in  Figure  6.  The  heuristic  function 
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generates  a  weighting  or  cost  factor  to  be  associated  with  each  node  that 
is  some  measure  of  the  "goodness"  of  the  solution  path  to  that  point. 
Different  types  of  problem  domains  have  different  opportunities  for 
defining  such  functions  (thus  the  notation  "specific  purpose");  however, 
frequently  used  measurement  concepts  are:  a  metric  representing  the  length 
or  difficulty  of  the  search  to  the  node  in  question,  or  a  metric 
representing  the  distance  to  the  goal  node  or  difference  between  the 
current  node  and  the  goal  node.  As  mentioned  earlier,  there  will  always  be 
a  tradeoff  between  the  search  time  saved  by  the  heuristic  functions  versus 
the  time  needed  to  compute  the  functions  themselves;  i.e.,  complex 
functions  may  provide  excellent  guidance  for  the  search,  but  the  time 
needed  to  compute  them  may  be  more  than  the  time  that  would  have  been  used 
by  a  more  random  search. 

Once  a  measure  function  has  been  defined,  the  search  process  can 
proceed  along  what  is  known  as  a  best-f i rst  search.  Using  this  technique, 
the  nodes  to  be  expanded  at  any  given  point  in  the  search  are  the  ones  with 
the  best  "goodness"  measure.  For  example,  consider  the  number  beside  each 
node  in  the  tree  of  Figure  8  to  be  the  value  of  the  heuristic  function  such 
that  the  lower  the  number,  the  better  to  expand  that  node.  Then  the  best- 
first  search  would  proceed  as  follows:  ACGHBDKEFI  JL. 


D.  ATTRIBUTES  OF  SYh-..IC  COMPUTING  AS  COMPARED  TO  NUMERIC  COMPUTING 
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At  this  point,  the  reader  should  be  gaining  a  cursory  view  of  what 
constitutes  symbolic  computation.  This  understanding  can  be  enhanced 
further  by  making  direct  comparisons  between  symbolic  computing  and  the 
more  familiar  numeric  computing.  Figure  9  lists  many  of  the  attributes  of 
the  two  computational  methods  in  such  a  way  that  a  comparison  can  be  made 
between  corresponding  attributes  in  each  column.  The  comparisons  are  dis¬ 
cussed  below. 

The  logical  inference  is  the  analogue  of  the  floating  point  operation. 
Whereas  the  floating  point  operation  is  basic  to  the  manipulation  of 
numeric  symbols,  the  inference  operation  is  used  to  manipulate  the  much 
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Figure  9.  Attributes  of  Symbolic  and  Numeric  Computation 
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broader  class  of  symbols  encountered  in  symbolic  computations.  A  logical 
inference  is  generated  by  combining  knowledge  elements  or  groups  of  objects 
to  reach  a  conclusion.  Whether  the  logical  inference  is  done  using 
syllogisms  (cascaded  "if/then"  statements),  graphical  linking,  frame 
matching,  or  logical  inference  techniques  depends  considerably  on  the 
knowledge  representation  used  (production  system,  semantic  network,  frames, 
or  first-order  logic,  respectively) .  But  it  is  the  combination  and  mani¬ 
pulation  of  symbolic  data  structures  (e.g.,  objects  and  their  attributes) 
by  logical  inferencing  that  is  the  basis  for  most  reasoning  techniques  in 
symbolic  computation. 

The  wel 1 -structured  data  formats  of  vectors,  matrices,  etc.  used  in 
numeric  computing  give  way  to  data  structures  that  can  change  their  shapes 
in  symbolic  processing.  In  performing  tasks  such  as  navigating  over 
unknown  terrain,  playing  a  game,  or  carrying  on  a  dialogue,  one  knows  prior 
to  performance  of  the  task  only  the  general  form  the  data  must  take  to 
represent  the  route  planning,  the  responses  to  the  opponents'  actions,  or 
the  meaning  of  the  spoken  phrases,  respecti vely.  Therefore,  symbolic 
processors  use  lists  of  objects  connected  by  pointers,  such  as  those  dis¬ 
cussed  in  previous  sections  of  this  chapter.  The  input  data,  the  data 
manipulated,  and  the  end  point  in  the  manipulation  of  that  data  are  usually 
a  phrase,  a  concept,  or  some  other  symbolic  structure  of  unconstrained  size 
and/or  shape,  rather  than  data  structures  with  specified  dimensions  as  in 
numeric  processing.  Further,  dynamic  data  structures  allow  the  machine  to 
deal  with  inexact  information  or  to  arrive  at  uncertain  or  inexact  conclu¬ 
sions  . 

This  use  of  inexact  or  incomplete  information  gets  to  the  heart  of  the 
differences  in  using  symbolic  versus  numeric  data.  The  algorithms  used  on 
numeric  computers  have  little  ability  to  generate  correct  numerical  output 
from  qualitative  descriptors  or  from  incomplete  numerical  input  data.  They 
depend  upon  exact  data  since  all  fundamental  operations  are  combinations  of 
numeric  entities.  On  the  other  hand,  it  is  a  feature  of  most  symbolic 
representations  that  conclusions  can  be  reached  in  a  qualitative  sense  by 
generating  relations  among  the  functional  attributes.  This  is  possible 
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even  when  the  input  information  is  incomplete,  or,  in  some  cases,  inexact. 
It  should  be  noted  that  such  technique:,  are  at  the  forefront  of  today's 
research  in  symbolic  computation,  and  comprise  a  discipline  called 
reasoning  with  uncertainty  (ref.  6). 

It  should  be  obvious  that  in  order  to  take  advantage  of  the  power  of 
symbolic  computing,  such  as  the  use  of-  inexact  or  incomplete  knowledge  in 
reasoning,  specialized  programming  languages  are  required.  We  should  note, 
however,  that  the  numeric  programming  languages,  such  as  FORTRAN,  could,  in 
the  hands  of  clever  programmers,  be  used  in  symbolic  computing.  However, 
they  have  not  been  optimized  for  such  use  and  would  make  symbolic  pro¬ 
gramming  a  very  cumbersome  task.  The  LISP  language,  and  its  major 
derivatives  of  INTERLISP  and  MACLISP,  create  a  programming  environment  that 
facilitates  the  manipulation  of  symbolic  expressions  characterized  by 
flexible  data  structures.  The  semantic  meanings  of  objects  are  readily 
changed  by  adding  and  deleting  the  variable  lists  of  attributes.  Another 
notable  character! Stic  of  LISP  is  its  recursive  nature  -  any  LISP  function 
can  call  itself  and  any  program  can  be  defined  in  terms  of  itself.  Such  a 
capability  facilitates  the  search  of  lists  of  indefinite  length  in  the 
search  for  certain  elements  (attributes)  in  the  lists. 

A  very  different  language  is  PROLOG  (PROgramming  LOGic).  Based  on 
production  rules,  it  uses  pattern  matching  techniques  to  prove  or  disprove 
program  statements.  The  language  relies  heavily  on  predicate  calculus  in 
establishing  relationships  between  objects.  The  fundamental  operation  in 
PROLOG  is  the  logical  proof  of  some  condition  or  relationship  starting  with 
a  set  of  more  primitive  conditions. 

The  sixth  element  in  Figure  9,  the  independence  of  control  and  problem 
knowledge,  represents  a  drastic  difference  in  the  way  symbolic  systems 
process  information.  In  symbolic  ccmputation,  the  control  refers  to  any 
process,  explicit  or  implicit,  which  governs  the  order  of  problem  solving 
activities. 3.4  A  key  aspect  of  this  occurs  in  expert  systems  (discussed  in 
Section  III.E)  where  the  actual  structuring  of  the  solution  strategy  can  be 
changed  recursively  based  upon  changes  in  the  program's  evolution.  This  is 
very  different  from  a  numeric  computation,  where  changes,  even  conditional 
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branches  within  a  program,  are  input  "a  priori"  to  the  system.  It  is  this 
independence  of  the  knowledge  base  from  the  control  activities  that  allows 
an  expert  system  "shell"  to  be  robust  enough  to  be  applied  to  more  than  one 
problem  domain.  As  an  example,  although  the  operating  program  MYCIN  was 
originally  developed  to  aid  in  the  medical  diagnosis  of  bacterial 
infections,  it  may  be  applied  to  a  knowledge  base  of  crystallographic 
information  to  aid  crystal  growers.  Instead  of  containing  rules  relating 
symptoms,  bacteria,  and  remedies,  the  knowledge  base  would  contain  rules 
relating  measurements ,  crystal  lographic  structure,  and  recommendations  for 
regrowth.  In  the  numeric  domain,  one  cannot  take  a  program  written  for, 
say,  VLSI  design  and  adapt  it  for  lens  design  with  only  a  change  in  the 
input  data . 

liven  the  way  one  makes  changes  to  symbolic  programs  differs  from 
techniques  in  the  numeric  domain.  If  a  change  is  required  in  the  program 
of  a  numeric  computer,  the  entire  program,  or  at  least  the  macro  in  which 
the  change  is  to  be  made,  must  be  pulled  up,  edited,  and  returned  to  the 
system,  at  which  time  the  macro  or  the  entire  code  must  be  recompiled.  In 
prograir.!!-: *■  j  environments  built  upon  LISP  one  can  modify  the  program  without 
such  activity.  LISP  programs  and  data  sets  are  both  written  in  the  same 
syntax  -:M'i  form.  As  a  result,  a  LISP  program  can  manipulate  (alter) 
another  LIS;>  program  or  data  base.  It  can  automatically  modify  faulty 
rules  or  knowledge,  or  it  can  call  suspect  rules  or  knowledge  to  the 
attention  of  the  operator  who  can  then  make  changes  interactively  without 
having  to  iprompile  the  entire  orogram.  These  environments  are  very  power¬ 
ful,  allowing  the  operator  to  slide  a  window  through  the  program  which 
allows  one  to  see  in  real  time  the  outcome  of  a  change  in  the  knowledge 
base  or  i/’e  structure  anywhere  in  that  window.  At  the  present  time,  this 
form  of  software  engineering  is  not  available  in  numerical  computers,  for 
which  the  time-consuming  debugging  cycle  is:  find  bugs,  rewrite, 
recompile,  r-run,  look  for  new  bugs,  rewrite,  recompile,  rerun,  etc. 

if  making  changes  to  symbolic  programs  differs  from  numeric  practice, 
then  it  is  not  altogether  unexpected  that  the  power  of  symbolic  computing 
can  he  used  in  the  debugging  phase  of  program  development  as  well.  It  is 
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certainly  possible  to  build  into  a  numeric  computer  program  the  ability  to 
print  out  messages  indicating  which  "if/then"  paths  were  taken  during 
execution  of  the  program.  In  a  symbolic  computer,  however,  not  only  can 
the  machine  trace  for  the  operator  the  path  taken  to  the  solution,  but  also 
it  can  tell  the  operator  why  it  took  each  path  and  not  others.  This 
feature  is  useful  for  several  reasons,  only  two  of  which  are  stated  heio. 
Fi*st  of  all,  It  is  a  valuable  diagnostic  tool;  in  fact,  it  is  an  integral 
part  of  most  development  environments  designed  for  use  with  LISP  machines 
Second,  human  beings  want  to  know  "why?";  that  is,  we  tend  to  ask  how  some 
conclusion  was  reached  by  another  human  in  order  to  form  some  opinion  about 
its  validity;  and  one  would  not  expect  a  machine  replacing  a  human  expert 
to  be  exempt  from  having  to  justify  Its  own  conclusions.  This  leads  the 
user  to  a  feeling  of  confidence  in  the  machine. 

Having  looked  at  the  similarities  and  differences  between  symbolic  and 
numeric  computing,  the  reader  can  begin  to  understand  some  of  the  ways  in 
which  symbolic  computing  can  be  applied  to  real-world  problems.  The 
earlier  discussions  on  knowledge  representation  and  search  strategies  will 
be  built  upon  in  the  next  section,  where  the  reader  will  see  both  how  these 
techniques  are  specialized  and  modified  as  well  as  how  they  can  lead  to 
performance  problems.  This  is  particularly  true  with  the  search  process, 
which  is  very  often  are  the  cause  for  the  computational  bottlenecks  In  AI 
systems,  and  the  discovery  of  ways  in  which  optics  can  Impact  these 
operations  may  be  a  key  to  optical  symbolic  computing.  Before  discussing 
these  opportunities,  let  us  take  a  look  at  the  symbolic  computing  domains 
of  speech  recognition,  vision/image  understanding,  natural  language  proces¬ 
sing,  and  expert  systems. 
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CHAPTER  III 

SYMBOLIC  COMPUTATION:  FUNCTIONAL  CAPABILITIES 

The  previous  section  highlighted  the  underlying  principles  of 'symbol  ic 
computation,  providing  both  an  introduction  to  the  subject  and  a  discussion 
of  the  types  of  representations  employed  and  search  strategies  utilized. 
This  section  will  extend  that  discussion  by  first  presenting  an  overview  of 
the  four  main  disciplines  which  comprise  symbolic  computation:  speech 

understanding,  vision,  natural  language  understanding,  and  expert  systems. 
These  applications  can  take  several  forms,  each  of  which  draws  upon  aspects 
of  knowledge  acquisition,  reasoning,  and  retrieval.  If  they  have  one 
aspect  in  common,  though,  it  is  the  use  of  knowledge,  about  the  system  or 
domain  under  consideration,  to  improve  the  machine's  understand!' nq  of  input 
i  nformation . 


A.  OVERVIEW  OF  SYMBOLIC  COMPUTING  DOMAINS 

As  In  any  new  field,  there  is  considerable  debate  as  to  what 
constitutes  AI  and  what  the  appropriate  taxonomy  of  AI  disciplines  should 
be.  Complicating  matters  is  the  fact  that  the  overall  nature  of  the 
programming  tasks  and  the  optimal  computing  structures  for  It  are  still  not 
well  understood. I"3  Nevertheless,  major  advances  have  been  made  In 
applying  the  knowledge-based  techniques  presented  in  the  previous  section 
to  a  wide  variety  of  problems.  As  a  result,  symbolic  computing  research  is 
currently  concentrated  in  four  areas  which  form  the  basis  of  generic 
machine  understanding  capabilities:  speech  recognition  and  understanding, 
vision  or  image  understanding,  natural  language  understanding,  and  expert 
systems  and  reasoning. 

To  enable  a  machine  to  respond  and  identify  spoken  language,  we  can 
apply  advanced  pattern  recognition  techniques  to  process  the  Input  signal 
and  recognize  the  words.  This  aspect  of  speech  research  is  termed  speech 
recoqni tion .  Language  is  typically  entered  Into  a  speech  recognition 
system  by  means  of  a  microphone. ^  Recognition  can  occur  in  any  of  several 
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ways,  but  each  typically  involves  performing  a  digital  comparison  of  the 
input  phrase  or  sentence  with  elements  stored  in  the  computer's  memory. 
Speech  understanding,  on  the  other  hand,  focuses  on  the  use  of  knowledge 
to  attach  meaning  to  a  series  of  spoken  inputs.  Although  no  real 
boundaries  exist,  in  most  cases  speech  recognition  is  referred  to  as  low- 
level  speech,  emphasizing  signal  processing  and  template  matching;  speech 
understanding,  by  virtue  of  its  reliance  upon  knowledge  processing,  is 
termed  high-level  speech.  The  goal  of  all  speech  research,  obviously,  is 
to  achieve  totally  speaker  independent,  high  accuracy  recognition,  over  a 
large  vocabulary  base. 

Vision,  or  image  understandi ng,  as  it  is  more  commonly  known  in 
computer  science,  refers  to  the  ability  of  a  machine  or  computer  to 
understand  scenes  utilizing  a  visual  input.  Such  systems  have  the  goal5  of 
pattern  and  image  understanding  with  a  degree  of  accuracy  that  parallels 
human  vision  systems.  Recognition  can  be  accomplished  in  many  ways,  but 
most  involve  matching  elements  of  an  observed  scene  to  objects  represented 
In  the  system's  knowledge  oase.  This  is  similar  to  the  problem  for  speech 
understanding  systems,  except  that  the  information  is  comprised  of  input 
Images  rather  than  waveforms,  and  the  objects  themselves  are  three 
dimensional.  Since  most  visual  Input  devices  are  two-dimensional  imaging 
arrays,  such  as  solid-state  TV  cameras,  much  emphasis  has  been  placed  on 
achieving  some  level  of  full  three-dimensional  information  from  the  input 
scene.  By  analogy  with  speech,  Image  processing  functions  such  as 
preprocessing,  Image  restoration,  or  gradient  calculations  are  termed  low- 
level  vision.  Any  processing  requiring  Interactions  with  the  knowledge 
base  are  known  as  high-level  vision. 

Natural  language  understanding  focuses  on  the  ability  of  a  machine  to 
attach  meaning  to  English  language  (or  some  other  written  language)  phrases 
Input  to  it  through  a  peripheral  device,  typically  a  keyboard  (see 
Figure  10).  To  be  effective,  therefore,  natural  language  understanding 
systems  should  incorporate  knowledge  of  linguistic  theory,  such  as  sentence 
decomposition  according  to  the  syntax  of  the  language  (parsing)  and 
assigning  meaning  to  words  or  phrases  { semantics ) .6  Thus  natural  language 
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processing  is  a  very  knowledge  intensive  activity,  relying  upon  knowledge 

of  the  grammar  and  the  context  to  attach  meaning  to  the  input  sentence  or 

phrase.  Knowledge  of  these  and  other  linguistic  attributes  must  be 

included  within  the  system's  knowledge  base,  and  the  challenge,  as  in  other 

|  AI  disciplines,  lies  in  resolving  and  implementing  one  of  many  different 

j  strategies  for  interpreting  input,  such  as  English  sentences. 

■  Expert  systems  are  a  discipline  within  applied  artificial  intelligence 

l  which  seek  to  emulate  human  expertise  in  specialized  areas,  known  in  AI 

» 

jj  parlance  as  domains.  Examples  include  the  interpretation  of  spectral  data 

I  (the  DENDRAl  project), ?  the  configuration  of  minicomputers  from  components 

5  (the  R1  project), 8  and  the  diagnosis  of  diseases  in  internal  medicine  (the 

|  MYCIN  project). ^  These  computer  systems  "achieve  high  levels  of  perform- 

i  ance  In  task  areas  that,  for  human  beings,  require  years  of  special 

J  education  and  training. "10  Here,  expertise  is  defined  as  the  "set  of 

j  capabilities  that  underlies  the  high  performance  of  human  experts, 

l  Including  extensive  domain  knowledge,  heuristic  rules  that  simplify  and 

f 

Improve  approaches  to  problem  solving,  metaknowledge  and  metacognition,  and 

compiled  forms  of  behavior  that  afford  great  economy  in  skilled 

performance."^  (The  prefix  meta-  refers  to  knowledge  about  the  root 

!  word.)  These  systems  have  been  very  successful  recentlylO  in  solving 

j 

|  problems  and  tasks  which  are  knowledge  and  heuristic  intensive,  those  that 

4  would  typically  take  a  human  expert  between  8  and  40  hours  to  accomplish. 

•4 

jj  Using  the  capabilities  of  LISP  machines  (see  Figure  11)  and  specialized 

software  development  tools,  expert  systems  research  seeks  to  capture  human 
expertise  accurately  enough  to  apply  it  to  more  complex,  demanding,  and 
diverse  tasks. 

These  functional  capabilities  are  presented  in  this  order  for  several 
reasons.  Historically,  the  first  successful  applications  of  AI  research 
centered  on  the  human/computer  Interface,  either  spoken  or  visual,  and  on 
game  playing  by  machines.  (For  a  nice  review  of  early  AI  research,  the 
reader  Is  referred  to  reference  6,  Vol .  1)  This  has  evolved  into  what  we 
now  know  as  low-level  processing,  comprised  mainly  of  signal  and  image 
processing,  with  knowledge  based  techniques  naturally  assigned  the  role  of 
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higher  level  processing.  By  virtue  of  their  long  history,  speech  and 
vision  research  are  thought  to  be  more  mature  discipline  than  natural 
language  or  expert  systems.  As  an  example,  many  of  the  concepts  required 
for  natural  language  processing  had  their  genesis  in  linguistic  theory  and 
in  higher  level  speech  research.  And  the  explanation  facilities  in  expert 
systems  have  in  turn  developed  out  of  natural  language  research. 

Second,  there  is  a  natural  evolution  from  capabilities  which  primarily 
focus  on  the  man-machine  interface  to  those  which  emphasize  computational 
reasoning  with  minimal  human  interaction.  Finally,  this  will  allow  us  to 
minimize  the  amount  of  overlap  and  repetition  of  concepts.  In  the  ensuing 
discussions,  we  will  see  a  great  deal  of  commonality  between  each  of  these 
disciplines,  and  in  many  cases  this  is  a  result  of  critical  ideas 
transitioning  from  one  area  to  another.  The  use  of  the  blackboard 
architecture  (to  be  discussed  in  Section  III.B)  is  a  case  in  point. 
Originally  developed  for  the  Hearsay  speech  understanding  system/  it  has 
now  been  successfully  applied  in  natural  language  and  expert  systems. 

In  what  follows,  each  of  these  symbolic  computing  capabilities  will  be 
discussed  in  greater  detail,  with  an  eye  towards  identifying  the  underlying 
technology.  In  doing  this,  the  major  challenges  to  current  research  in 
each  field  will  be  highlighted.  Each  subsection  will  conclude  with  a 
review  of  current  and  projected  machine  intelligence  capabilities,  and  an 
analysis  of  the  problems  and  computational  bottlenecks  impeding  the  progress 
of  each  AI  research  area. 

B.  SPEECH  UNDERSTANDING 


Most  speech  understanding  activities  attempt  to  endow  computer  systems 
with  the  required  knowledge  and  auditory  discrimination  to  achieve  real¬ 
time  machine  understanding  of  continuous  spoken  input.  Unfortunately  for 
computer  systems,  .humans  have  not  standardized  on  any  particular  form  of 
word  pronunciation.  Thus,  even  systems  which  seek  to  recognize  speech 
Input  (but  not  necessarily  understand  it)  are  faced  with  the  problem  of 
distinguishing  words  independent  of  speaker,  the  rate  or  speech,  any 
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dialects  or  accents,  vocabulary,  dropped  syllables,  and  last,  but  not 
least,  background  noise  environment. 

This  is  obviously  a  formidable  problem,  since  humans  themselves  do  not 
yet  have  a  reliable,  100  percent  speaker  independent  speech  understanding 
system.  We  have  problems  understanding  speech  in  which  adjacent  words  are 
run  together,  influencing  the  pronunciation,  as  is  typical  with  geographic 
references  such  as  Long  Island,  which  is  often  spoken  as  Longoilan.  We 
still  have  trouble  with  different  dialects,  e.g.,  the  word  oil  in  various 
sections  of  the  country  is  pronounced  as  all  or  as  ole,  and  very  often  we 
have  to  infer  words  or  missing  phrases  from  context.  This  last  point 
embodies  the  critical  challenge  for  knowledge  retrieval  and  reasoning  in 
speech  systems  -  the  use  of  pattern  matching  techniques  for  word 
identification,  and  knowledge  for  understanding  meaning  and  for 
interpretation.  The  question  is  how  to  apply  the  knowledge,  and  when  (see 
Figure  12) . 

Many  techniques  have  been  developed  for  knowledge-based  interaction  in 
speech,  but  the  two  most  developed  are  the  isolated  word  recognition  (IWR) 
systems  and  the  continuous  speech  understanding  (CSU)  systems. H  The 
isolated  word  systems  focus  in  on  word  identification,  using  segmented 
spoken  input,  such  as: 

"hel i um-neon ...laser.. .broken" 
or: 

"weather. . .today. ..rain" 

to  decrease  problems  associated  with  identifying  the  beginning  and  ends  of 
words.  The  problem  of  identifying  the  word  from  the  transformed  acoustic 
input  still  remains,  but  is  handled  by  template  matching.  This  process, 
shown  in  Figure  13,  involves  taking  the  input  signal  and  identifying  key 
features,  such  as  the  major  sound  variations  and  the  beginnings  and  ends  of 
words.  The  next  step  is  dynamic  time  warping,  which  seeks  to  align  the 


43 


THE  BDM  CORPORATION 


_  oo 

X  = 

<u 

c  <-> 

o  « 


oq 

rz  oo  c 
u.  2  3i 

3  3  to 
5  oo  <u 
«  c  « 

Z  «  2 

-1  cu 


4> 

OO 

*a 

4)  4 
—  V5 
£  C3 

o  e 

c 

*4 


o 

4)  ‘w 

|l 

03  C. 
4>  C 

U-.  C 

o 

U 


JC  4> 

<y  o 
<u  >- 

4)  3 
Cl,  O 
CO  CO 


pBTn.'  J&laMIL. 

40 

c  c 

4)  lc 
►  ~  ° 

i3  ca 


C/5 

4)  o  2 

^Eo- 


03  ,3  ~> 

J  CU  > 


T3 

<U  v) 

°  3' 

o  5 

c-!2 

E  D  £ 

£  i:  O 

h  4>  ,9 

y  4>  U  -C 

So  a,  ,s 

CO  >  J  Cu 


00  g 
c  ~ 


^  4) 

U  £ 

aj  <S 

3,  *- 

.2  4) 
°  ~ 

Z  D 


j=  e 

a.  g 
£  « 
W  £ 
i-g 
£  w 


o 

-O  o£ 

£  .5 

>-,  t/5 

CO  1/5 
X;  a; 
o  o 

*=  2 
<U  n“ 

Ea, 


Figure  12.  Speech  Understanding  Paradigm 


Figure  13.  Low-Level  Speech  Processing 
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length  of  the  spoken  word  with  some  internal  reference  length.  As  an 

example,  the  words  helium-neon  and  Caribbean  are  often  pronounced  as: 

"hei ium-neon"  "Care-ih-be-yan" 

"heel-e-yum  nee-un"  "Cah-rib-e-an" 

"heel yum-neun"  "Cah-ri b-be-yun" 

each  of  which  is  spoken  at  a  different  rate.  To  standardize  on  the  lengths 
of  words,  the  signal,  as  a  function  of  time,  is  stretched  or  compressed,  in 
order  to  make  it  compatible  with  an  internal  reference  length.  The 
features  of  this  "standardized  word"  are  then  compared  with  templates 
stored  in  memory,  and  weighted  metrics,  such  as  those  shown  in  the  figure, 
are  used  to  match  the  word  with  its  counterpart  in  memory. 

It  is  here  that  we  see  the  greatest  bottlenecks  in  the  low-level 
process.  To  give  an  idea  of  the  computation  rate,  there  are  approximately 
500  metric  computations  typically  performed  per  vocabulary  word  using 
dynamic  time  warping  during  a  500  msec  utterance. I3  Using  this  as  a  basis, 
tne  tocal  number  of  multiplies  (or  inner  product  computations)  per  word 
ranges  from  5  K  to  10  K,  using  the  weighted  Euclidean  metric  for  filter 

bank  features^4  and  the  Itakura  metric  for  linear  predictive  coding  (IPC) 
features. 15  Therefore,  an  optimal  inner  product  processing  environment, 
such  as  optics,  should  have  -jreat  utility  in  speech  recognition. 

The  template  match  Is  not  the  only  computational  bottleneck, 
unfortunately .  The  application  of  high-level  knowledge,  which  is  a 
retrieval  and  reasoning  problem,  also  limits  the  accuracy  and  timeliness  of 
the  system,  preventing  real-time  operation.  Figure  14  gives  an  example  of 
how  knowledge  is  used  to  obtain  meaning  from  the  input  phrase.  Here, 

successive  matchings  with  words  in  the  network  allow  the  machine  to 
Interpret  the  phrase.  The  reader  should  note  that  this  is  just  one  of  a 
number  of  possible  representation  techniques,  and  speech  systems  have 
evolved  which  have  used  frames,  scripts,  and  rules  as  well  as  semantic 

networks . 
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There  are  many  types  of  knowledge  that  can  be  used  in  understanding 
speech.  Many  of  these  are  derived  from  linguistic  considerations,  and  we 
can  therefore  expect  that  they  will  be  important  in  the  discussion  of 
natural  language  as  well.  Each  of  these  is  specific  to  the  vocabulary 
i  under  consideration,  and  is  heavily  dependent  upon  the  rules  associated 

|  with  the  grammatical  structure  of  the  language.  Thus  it  is  important  for 

j  the  machine  to  understand  that  adjectives  modify  nouns,  adverbs  modify 

!  verbs,  ...etc. 

The  most  obvious,  and  probably  the  most  familiar  type  of  knowledge,  is 

•  phonetics ,  which  refers  to  the  physical  characteri sti :s  of  the  sounds  of 

,  each  word  in  the  vocabulary;  it  is  the  acoustic  signature  of  the  words. 

Another  important  type  of  knowledge  is  morphol oq> ,  which  refers  to  the  ways 
in  which  the  basic  units  of  words  (basic  morphemes)  can  be  combined  to  form 
|  new  words,  plurals,  tenses,  etc.  Using  these  types  of  knowledge  as  a 

•  basis,  higher  level  knowledge  attributes  can  be  used  to  determine  meaning 
from  spoken  input.  In  the  introduction  to  this  section,  we  introduced  the 

|  concepts  of  syntax  (the  sentence  structur  .  and  grammar)  and  semantics  (the 

ways  in  which  word  meanings  are  combined  to  form  the  meanings  of  sentences 
;  and  phrases)  in  the  discussion  on  natural  language.  It  should  be  clear 

(that  these  types  of  knowledge  are  necessary  for  any  speech  recognition 
system  as  wel 1  . 

*  The  last  type  of  knowledge,  and  possibly  the  most  important  for  high- 

i 

'  level  speech,  is  pragmatics.  Pragmatics  refers  to  the  rules  of 

1  conversation  and  dialogue,  which  allows  the  system  to  distinguish  questions 

!  and  queries  from  factual  statements.  In  the  following  examples,  the 

• 

!;  statements  all  have  the  same  syntactic  and  semantic  meaning,  yet  each 

i  represents  a  different  type  of  interaction  with  the  machine: 

|  (1)  "The  part  in  the  helium-neon  laser  is  broken." 

■  (2)  "Is  the  part  in  the  helium-neon  laser  broken?" 

t  (3)  "What  part  of  the  helium-neon  laser  is  broken?" 

^  The  first  is  a  simple  factual  statement,  whereas  the  second  is  a  rearrange- 

|  ment  of  the  first,  requiring  a  yes  or  no  response  from  the  system.  In  the 

third,  however,  a  yes  or  no  response  is  insufficient;  another  level  of 
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response  is  called  for  from  the  system.  It  is  the  pragmatic  knowledge  of 
the  vocabulary  that  allows  the  system  to  recognize  and  understand 
differences  in  sentence  and  phrase  meanings. 

These  types  of  knowledge  could  have  been  incorporated  into  our  earlier 
IWR  example,  but  they  were  not  required.  The  scheme  for  using  knowledge  in 
the  IWR  example  is  very  general  and  is  applicable  to  most  types  of 
knowledge,  and  is  referred  to  as  the  bottom-up  hierarchical  paradigm. 
Here,  bottom-up  implies  the  use  of  numerical  techniques  at  the  start  of  the 
algorithm  :o  improve  the  signal  and  perform  all  feature  extractions,  which 
collectively  are  known  as  low-level  speech  processing  (see  Figure  12).  The 
high-level  processing  is  the  use  of  knowledge,  and  in  this  case  occurs 
after  initial  low-level  processing  has  been  completed.  Hierarchical  refers 
to  the  passage  of  system  control  from  the  low-level,  through  a  series  of 
intermediate,  sequential  steps,  to  the  point  where  the  knowledge  inter¬ 
actions  occur. 

This  paradigm  is  not  unique,  and  as  we  shall  see  in  our  discussion  of 
continuous  speech  (below),  can  be  structured  as  a  top-down  strategy,  or 
even  a  middle-out  scheme.  In  each  case,  however,  the  overriding  considera¬ 
tion  is  where  and  hen  the  knowledge  about  the  problem  is  retrieved  and 
utilized.  If  it  is  at  the  beginning,  chances  are  that  we  are  dealing  with 
a  top-down  paradigm;  if  it  is  at  the  end,  as  in  the  case,  of  the  Isolated 
word  example,  it  is  a  bottom-up  scheme. 

In  continuous  speech  understanding,  grammatic  or  linguistic  models  are 
used  to  constrain  the  knowledge  retrieval  process,  which  means  limiting  the 
active  vocabulary  or  the  number  of  words  possible  at  any  instant  in  time  in 
a  CSU  system.  Not  only  is  processing  time  saved  by  limiting  the  size  of 
the  search  space  for  each  word,  but  the  potential  for  confusion  is  lowered 
and  a  higher  recognition  rate  results. 

The  basic  CSU  system  consists  of  an  acoustic  processor,  which  carries 
out  the  initial  signal  to  symbol  transformations,  and  a  linguistic  decoder, 
which  applies  knowledge  to  understand  the  spoken  input.  Typical  CSU 
systems  take  two  forms:  recognizing  individual  words  in  random  arrange¬ 
ments,  such  as  for  data  base  retrieval,  and  understanding  meaningful, 
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continuously  spoken  sentences.  The  two  have  some  features  in  common  with 
IWR,  as  well  as  a  few  differences,  but  the  principal  difference  is  in  the 
area  of  word  se ^mentation . 

Word  segmentation,  or  determining  the  beginning  and  end  points  of 
individual  words,  is  a  critical  problem  in  CSU.  Since  words  spoken  in 
continuous  speech  tend  to  run  together,  the  end-point  detection  algorithms 
used  in  IWR  will  not  work.  Several  methods  exist  for  overcoming  this 
problem,  and  all  are  variations  on  the  dynamic  time  warping  algorithm 
discussed  earlier.  This  still  remains  as  a  processing  bottleneck,  although 
not  as  severe  as  the  template  match  or  the  knowledge  retrieval  process. 

Most  continuous  speech  understanding  systems  are  top-down,  knowledge 
directed  systems,  using  knowledge  about  the  speaker's  problem  to  limit  or 
constrain  the  expected  content  of  the  input. 11  Such  a  technique  has  been 
successfully  implemented  in  the  Hearsay-Ill  system, 12  where  each  type  of 
knowledge  can  interact  with  the  partially  processed  input  signal  inde¬ 
pendently  of  the  others,  as  shown  in  Figure  15.  The  novel  feature  of  this 
system's  architecture  is  the  blackboard,  an  area  in  memory  allocated  for 
communicating  between  the  various  knowledge  "experts."  In  ^act,  by 
allowing  each  type  of  knowledge  to  operate  on  the  input  signal,  we  have  in 
fact  developed  individual  speech  "expert  systems"  (see  Section  III.E). 
This  blackboard  concept,  which  allows  cooperation  among  multiple  knowledge 
bases,  has  been  successfully  applied  to  both  natural  language  understanding 
systems  and  to  expert  systems 

The  use  of  heuristic  search  to  constrain  the  input  is  not  without  its 
price,  and  that  is  the  tradeoff  between  generality  and  performance.  This 
will  be  a  theme  that  we  will  see  throughout  this  section,  and  one  which 
underlies  all  A I  research  at  the  present  time.  To  appreciate  this 
tradeoff,  the  reader  should  recognize  that  the  vocabulary  of  the  English 
language  is  very  large  -  the  24  volumes  of  the  Oxford  English  Dictionary 
are  a  testament  to  that.  Yet  most  human  beings  have  a  usable  vocabulary 
must  smaller  than  that,  but  even  20,000  words  are  staggering  when  one  looks 
at  the  number  of  possible  combinations  and  pronunciations  associated  with 
each  word.  Early  speech  systems,  wh'ch  sought  to  include  most  words  within 
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the  typical  human  vocabulary,  were  never  able  to  even  approximate  real  time 
operation,  and  spent  far  too  much  time  searching  memory.  The  solution  to 
this  bottleneck,  borrowing  from  cognitive  psychology,  is  the  use  of  knowl¬ 
edge  about  the  problem  or  domain  to  constrain  the  search.  Once  this  was 
realized,  near-real -time  speech  systems  became  possible,  as  well  as  the 
recent  successes  of  vision,  natural  language  and  expert  systems. 

The  reader  should  note  that  much  of  the  recent  improvement  in  the 
performance  of  both  IWR  and  CSU  systems  is  due  to  developments  in  special 
purpose  signal  processors.  These  include  special  purpose  IC's,  as  well  as 
systolic  structures  for  processing  lower  level  speech.  The  remaining 
critical  bottleneck,  that  of  knowledge  retrieval  and  processing,  requires 
additional  research  Into  symbolic  computing,  In  order  to  structure  an 
optimal  high-level  speech  processing  environment.  (Several  such  structures 
are  suggested  in  Section  IV,  where  the  relationship  between  processing 
structure  and  function  will  be  examined  more  closely.)  Special  purpose 
processors  are  also  under  consideration  for  vision  systems,  as  we  shall  see 
In  the  next  section. 

C.  VISION 


Vision,  as  its  name  implies,  Involves  providing  a  machine  with  the 
capability  to  understand  visual  Input  in  real  time.  As  In  human  vision 
systems,  machine  vision  Includes  the  identification  of  what  Is  In  the  image 
or  scene,  and  how  the  various  elements  are  related  to  one  another,  both 
spatially  and  temporally.  Such  systems  have  broad  application,  both 
comme,“L'. ally  and  militarily,  beyond  their  use  as  an  input  device  to  ease 
man-machine  communications.  Many  of  the  recent  advances  in  process  automa¬ 
tion  and  robotics  (e.g.,  bin-picking  robots)  have  become  possible  because 
of  machine  vision  research.  Other  commercial  sector  applications  Include: 
sensors  for  automated  welding,  handling  hazardous  materials,  VLSI 
manufacturing,  computer  aided  design  and  manufacturing  (CAD/CAM);  Inspec¬ 
tion  of  manufactured  goods;  medical  imaging;  remote  sensing  for 
cartography,  traffic  monitoring,  driving  aids,  land  use  management,  and 
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exploration  for  oil  and  minerals.  Potential  military  applications  are  just 
as  diverse,  such  as  autonomous  vehicle  navigation,  photointerpretation, 
reconnai ssance ,  target  acquisition  and  range  finding,  terminal  homing,  and 
tracking  of  moving  objects.  We  also  see  it  as  the  most  obvious  application 
for  optics,  since  visual  information  is  inherently  optical  in  nature  and 
can  be  processed  two-dimensional 1y. 

In  many  ways,  the  techniques  used  in  machine  vision  research  build  on 
the  knowledge-based  systems  presented  in  the  previous  section  on  speech 
understanding.  As  in  speech,  high-level  knowledge  about  the  scene  or  image 
under  consideration  Is  effectively  use  to  constrain  the  object  identifica¬ 
tion  and  understanding  process.  Similarly,  in  vision,  this  high-level 
knowledge  specific  to  the  scene  can  be  effectively  used  In  guiding  lower 
level  operations.  Later  on  in  this  section  we  will  see  examples  of  how 
knowledge  can  be  applied  to  constrain  the  processing. 

Another  parallel  between  speech  and  vision  is  the  approach  to  use  of 
knowledge  and  control.  Early  work  in  vision  was  carried  out  using  the 
bottom-up  hierarchical  paradigm,  similar  to  the  case  in  Isolated  word 
recognition  systems.  With  this  approach,  preliminary  processing  performs 
edge  detection,  feature  extraction,  and  linking  without  the  benefit  of 
knowledge-based  Inferencing.  As  in  the  case  of  speech,  this  preliminary 
processing  is  referred  to  low-level  vision,  and  the  subsequent  Interaction 
with  the  knowledge  base  is  termed  high-level  vision  processing. 

Currently,  developers  are  exploring  advantages  of  mixing  the  high-  and 
low-level  processing  in  paradigms  of  broader  scope.  For  example,  high- 
level  reasoning  about  expected  features  aid  their  relationships  Is  useful 
In  tasking  and  guiding  the  lower  level  processing.  The  utility  of  this 
stems  from  the  efficiency  of  focusing  on  specific  regions  of  an  Image  for  a 
specific  purpose,  such  as  resolving  an  edge  and  following  a  lineal  feature. 
At  other  stages  In  the  vision  process,  the  low-level  processing  may  be 
involved  in  generating  hypotheses  to  be  evaluated  by  the  top  level 
processes.  In  this  case,  the  efficiency  of  the  process  is  improved  by 
avoiding  hypotheses  which  are  inconsistent  with  the  image. 
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Finally,  just  as  speech  systems  incorporate  patter, 1  recognition 
techniques  in  the  low-level  processing,  vision  research  has  also  developed 
as  an  outgrowth  of  early  pattern  recognition  work.  However,  treating 
vision  as  mere  pattern  recognition  is  inaccurate;  pattern  recognition  is 
strictly  numerical  in  its  approach,  operating  directly  on  the  image. 
Vision,  In  contrast,  uses  knowledge  about  the  scene  to  categorize  the 
image,  and  operates  on  the  symbolic  representation  of  the  image  rather  than 
the  image  itself.  This  is  a  major  and  important  distinction,  and  is 
analogous  to  the  use  of  syntactic  knowledge  in  understanding  language  (see 
natural  language  understanding  in  Section  III.D). 

Having  introduced  the  concept  of  machine  vision,  we  would  like  to 
describe,  in  an  elementary  fashion,  the  types  of  processes  and  computations 
associated  with  vision.  Detailed  discussions  are  beyond  the  scope  of  this 
text,  but  the  reader  is  referred  to  excellent  books  on  the  subject  by  Marr^ 
and  by  Ballard  and  Brown^  for  additional  information. 

For  simplicity,  in  the  following  discussion  of  vision,  the  processing 
will  be  considered  to  originate  with  the  pixels  on  the  visual  detector, 
which  Is  usually  some  form  of  two  dimensional  imaging  array,  such  as  a  TV 
camera  or  vidicon.  As  in  speech,  low-level  processing  focuses  in  on  the 
transformation  from  signal-level  to  symbolic  information.  This  information 
is  in  the  form  of  intensity  variations  across  a  two  dimensional  array,  and 
the  relative  positioning  of  those  variations.  Color  and  texture  informa¬ 
tion  can  also  be  utilized,  since  both  can  be  extracted  from  the  input 
Image.  The  processing  procedure  begins  with  extraction  of  features  via  the 
processing  of  pixels  to  find  edges  and  regions,  as  shown  in  Figure  16. 

The  first  step  in  this  is  the  the  preprocessing  stage,  which  prepares 
an  image  for  actual  Image  processing  For  example,  image  restoration  may  be 
required  to  remove  the  effects  of  noise  ana  optical  or  motion  blur. 
Geometric  distortion  correction  can  be  required  to  remove  the  effects  of 
angle  of  view,  to  correct  for  common  lens  distortions,  or  other  types  of 
distortions  introduced  in  the  process  of  presenting  the  image  to  the 
computer.  More  than  one  sensor  may  be  used  in  order  to  obtain,  for 
instance,  a  3D  image,  or  to  take  advantage  of  information  in  several 
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wavelength  ranges.  In  such  a  case,  the  inputs  of  the  several  sensors  can 
combined  in  the  preprocessi ng  step  in  an  attempt  to  achieve  maximum 

fidel  i ty . 

Typically,  the  first  operation  after  preprocessing  is  feature 
extraction.  This  is  usually  accomplished  by  identifying  gradients  within 

the  image,  using  the  "DOG"  (difference  of  Gaussians)  operator.  This 

operator,  shown  in  Figure  17,  is  created,  as  its  name  implies,  by 
subtracting  two  Gaussian  operators  to  create  an  operator  with  two  zero 
crossings.  Application  of  this  operator  to  the  image  creates  a 
distribution  of  intensity  gradients,  and  in  particular,  identifies  the 
edges  of  objects. 

The  next  step  in  the  processing  is  termed  the  primal  sketch  and  first- 
order  segmentation  phase.  By  deciding  such  things  as  where  the  edges  of 
the  objects  contained  in  the  scene  are,  which  lines  and  edges  at  one  part 
of  the  scene  are  continuations  of  lines  and  edges  from  other  parts  of  the 
scene,  and  what  regions  of  the  scene  belong  to  individual  objects,  a 
stereotypical  "sketch"  can  be  generated  from  which  recognizable  features 
can  be  extracted.  The  idea  here  is  to  remove  the  computation  from  pixel 
level  processing  as  quickly  as  possible. 

As  an  example,  a  tank  is  represented  as  moving  on  a  roadway  in 
Figurer  18a-d.  The  "DOG"  operator  extracts  the  edges  of  the  tank,  which 
can  then  be  linked  during  the  primal  sketch  phase.  For  the  tank  example, 
the  image  has  now  been  segmented  into  differing  regions,  corresponding  to 
the  turret,  the  base,  the  road,  and  the  wheels. 

Some  method  must  be  devised  to  recognize  actual  objects  In  the  primal 
sketch  (for  the  tank,  these  are  the  wheels,  cannon,  road,...)  and  relation¬ 
ships  among  those  objects.  This  Is  done  during  the  iconic  feature 
extraction  and  grouping  step.  Currently,  there  are  several  competing  ways 
to  do  this,  and  simply  stated,  they  Involve  intensive,  complex  numerical 
and  symbolic  computations.  Symbolic  representation  of  the  segmented  image 
occurs  at  the  next  level,  and  finally  the  resultant  symbol  set  of  edges  and 
regions  is  semantically  interpreted  in  combination  with  models  generated  at 
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the  highest  level.  This  overall  system  architecture  is  presented 
graphically  in  Figure  19. 

As  an  example  of  this  process,  we  can  look  at  the  problem  of  identi¬ 
fying  that  the  image  is  in  fact  a  tank.  One  aspect  of  this  is  determining 
the  identity  and  relation  of  the  segments  from  the  image  in  Figure  18.  A 
technique  for  doing  this,  based  on  the  use  of  the  "frame"  (see 
Section  II. B),  is  shown  in  Figure  20.  Here,  the  shape  of  the  turret,  and 
its  position  relative  to  the  base,  match  descriptions  of  similar  objects  in 
the  systems  knowledge  base.  Within  the  frame,  the  turret  is  seen  as 
deriving  from  the  class  gun,  and  having  two  specific  incarnations:  the 
tank  turret  and  the  gunboat  turret.  Within  memory,  the  frame  can  call  a 
model  of  the  turret  up  for  symbolic  comparison  to  the  segmented  image. 
Then,  by  recursively  repeating  this  process,  the  system  can  potentially 
identify  the  tank  base.  The  system  can  then  hypothesize  that  the  segments 
of  the  image  are  elements  of  a  tank.  But  this  hypothesis  is  not  validated 
until  the  other  elements  in  the  scene  have  been  identified. 

As  one  might  expect,  this  process  is  a  significant  bottleneck  in 
machine  vision  systems,  even  with  the  use  of  domain  knowledge  to  constrain 
the  search  space.  Multiple  hypotheses  could  be  formed,  and  comparisons 
made  until  one  inference  emerges  without  contradiction.  This  offers  an 
opportunity  for  parallelism  in  this  computation,  but  at  present  remains  a 
critical  process  limiting  real  time  image  understanding.  However,  once 
object  identification  is  made,  the  remainder  of  the  computation  is  handled 
symbol i cal ly. 

This  leads  to  the  scene  understanding  phase  of  the  computation,  which, 
unfortunately ,  is  the  least  understood  area  of  the  enterprise.  To  say  that 
we  are  looking  at  an  object  with  a  turret  or  with  a  top  cannon  or  with 
tracks  is  far  from  understanding  that  it  is  a  tank  as  opposed  to  a  tracked 
troop  carrier  or  some  other  vehicle.  To  say  that  it  is  a  tank  is  far  from 
understanding  what  type  of  tank  it  is  or  what  it  is  doing.  To  do  even  the 
simplest  of  these  things  will  require  application  of  sophisticated  modeling 
methods  coupled  with  concise  representations  of  these  objects  in  the 
knowledge  base  of  the  system.  A  commonly  employed  technique  uses  multiple 
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images  to  analyze  movement  or  change,  and  then  inferences  are  made 
concerning  these  actions.  However,  this  use  of  the  time  domain  is  not 
always  possible,  and,  therefore,  techniques  for  scene  understanding  from 
context  are  under  development. 

In  vision  processing,  knowledge  about  the  scene  can  be  used  at  all 
levels  to  constrain  the  processing.  Initially,  geometric  inferencing  could 
use  knowledge  of  occluded  sides  of  a  three-dimensional  geometric  object, 
such  as  a  tank  turret,  to  project  the  appropriate  2-D  model.  Domain- 
specific  knowledge  will  distinguish  a  road  from  an  airport  runway,  the 
difference  being  the  environments  around  roads  versus  runways.  Finally, 
identification  of  the  road  would  increase  the  likelihood  that  the  scene  was 
that  of  a  tank,  rather  than  a  gunboat. 

If  we  had  implemented  a  different  control  strategy,  rather  than  a 
bottom-up  one,  knowledge  about  the  scene  would  have  constrained  the  system 
in  other  ways.  As  in  the  case  of  speech,  many  types  of  image  understanding 
control  strategies  are  possible,  including  the  top-down  hierarchical 
approach,  mixed  top-down  and  bottom-up,  heterarchical  approach,  and 
variants  of  the  blackboard  approach.  In  the  top-down  approach,  predictions 
made  from  high-level  models  in  the  knowledge  base  are  tested  and  verified, 
such  as  the  assumption  that  there  is  a  road  in  the  image.  In  this  case, 
templates  are  commonly  used  to  validate  -igher  level  hypotheses.  In  the 
blackboard  approach,  the  feature  extraction,  symbolic,  and  semantic 
processors  operate  in  parallel  and  communicate  with  each  other  via  a  common 
working  data  storage  or  "blackboard."  'he  impo-tant  point  here  is  that 
knowledge  influences  all  levels  of  the  computation,  whatever  the  control 
strategy,  and  very  often  requires  iterative  processing  at  lower  levels  to 
test  any  hypothesis. 

From  the  above  discussion,  the  reader  should  appreciate  that  the  heart 
of  the  vision  problem  is  the  recognition  of  complex  objects.  Eliminating 
current  computational  bottlenecks  involves  developing  new,  more  efficient 
algorithms  and  interfacing  techniques  to  allow  flexibility  in  transitioning 
between  the  basic  component  detection  phase,  the  ixtraction/feature 
grouping  process,  and  the  bign- level  interaction  with  some  knowledge  base. 
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hierarchical  control  and  hypothesis  testing,  will  be  revisited  in  the  next 
section,  on  natural  language  understanding. 

D.  NATURAL  LANGUAGE  UNDERSTANDING 

A  commonly  asked  question  is: 

"What  is  natural  language  processing,  and  how  does  it  difK  from 

speech  understanding?" 

To  answer  this  question,  we  will  first  define  natural  language 
understanding,  and  then  explore  its  relationship  with  the  types  of 
knowledge  used  in  speech  understanding.  Using  an  example  from  a  well  known 
piece  of  optics  literature,  we  will  then  discuss  some  of  the  techniques 
utilized  in  natural  language  processing.  This  will  lead  directly  into  our 
discussion  of  expert  systems  technology  In  Section  III.E. 

The  most  popular  definition  of  natural  language,  and  one  that  is  as 
accurate  as  any  other,  Is  that  it  is  a  language  used  in  spoken  or  written 
form  as  the  primary  means  of  communication  by  a  community  of  people. 
Hence,  in  some  countries,  and  within  most  portions  of  the  US,  natural 

language  most  often  refers  to  English;  in  other  countries,  it  is  Spanish, 
Chinese,  or  one  of  several  other  international  languages.  It  should  be 
clear,  therefore,  that  linguistic  attributes  play  a  large  role  in  the 
knowledge  processing  ascoclated  with  natural  language,  and  that  it  Is 

closely  allied  with  the  field  of  computational  linguistics. 

As  was  the  case  with  speech  and  vision,  natural  language  understanding 
(NLU)  had  its  genesis  in  man-machine  interface  research.  Here,  the  desire 
was  to  have  the  machine  understand  English  phrases  or  sentences  Input  to  it 
through  some  peripheral  device,  which  was  typically  a  keyboard.  At  that 
time,  the  principal  application,  In  addition  to  language  translation,  was 

the  querying  of  large  data  bases.  The  structuring  and  interpretation  of  the 
query  was  very  important,  and  the  user  had  to  be  careful  that  the  Input  was 
accurately  translates  into  the  language  of  the  data  base.  Drawing  upon  an 
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earlier  example,  a  query  to  find  articles  in  a  data  base  on  nonlinear 
optical  materials  for  2D  Spatial  Light  Modulators  could  be  structured 
using  natural  language  input  as: 

"Find  all  articles  on  nonlinear  optical  materials  for  2-D  Spatial 

Light  Modulators 

Early  on,  the  system  would  have  been  able  to  interpret  this  input  only 

because  all  words  within  the  vocabulary  of  the  system.  Using  an  appro¬ 
priate  indexing  scheme,  this  input  would  be  transformed  to: 

Class:  2-D(w) Spatial (w)Light(w)Modulators 

Subi ndex:  non!  i near (w)opti cal (w)material s 

where  the  (w)  refers  to  the  linking  of  the  words.  Even  today,  many 

data  base  query  systems  still  function  in  this  manner.  To  illustrate  the 

point  further,  say  It  was  desired  to  obtain  articles  on  performance 
parameters  of  2D  Spatial  Light  Modulators.  The  query: 

■'Now  find  all  articles  on  their  performance  parameters" 

would  only  lead  to  to  an  error  message  from  the  system,  even  If  this  state¬ 
ment  directly  followed  the  first.  This  is  because  the  second  phrase 
contained  an  indirect  reference  to  an  element  of  'che  first  sentence,  so 

that  the  subject  of  the  second  phrase  could  not  be  unambiguously  identified. 
This  application  of  natural  language,  termed  interactive  discourse,  looks 
at  pragmatic  knowledge  (see  Section  III.B)  to  understand  references  in 

conversations,  alleviating  the  problems  cited  In  the  example. 

In  discussing  natural  language  understanding,  therefore,  we  are 
treating  a  discipline  which  is  analogous  to  the  human  abilities  to  both 
read  and  comprehend  texts  as  well  as  car.  y  on  dialogues.  There  are  no 

direct  parallels  between  natural  language  and  the  signal/image  processing 
common  to  lower  level  speech  and  vision  systems.  There  is  a  signal  to 
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symbol  transformation,  but  it  occurs  directly  at  the  input  stage,  and  from 
that  point  on,  all  processing  is  symbolic  in  nature.  However,  one  saving 
point  is  that  much  of  the  same  knowledge  used  in  natural  language  under¬ 
standing  is  directly  applicable  to  high-level  speech  understanding. 

In  order  to  get  a  better  appreciation  of  natural  language  processing, 
it  is  worthwhile  to  identify  some  other  applications  of  NLU.  While  we  will 
address  some  of  the  main  applications,  the  interested  reader  is  referred  to 
the  book  by  R i ch 1 8  for  others.  Aside  from  input  interfaces,  a  principal 
application  of  NLU  is  in  the  area  of  computer  programming.  Here,  the 
objective  is  to  replace  expressions  such  as: 

DO  100  I  =  1,50 
J  *  0  +  DATA  (I) 

100  Continue 
AVE  =  J/50 


wi  th : 


"Calculate  the  average  of  the  50  pieces  of  data." 

Languages  are  being  developed  that  are  more  "English-like,"  especially  in 
the  field  of  AI,  as  we  will  see  in  the  discussion  on  expert  systems  in  the 
f o 1 1  owing  section . 

Another  aspect  of  NI.U  is  text  processing,  by  which  we  mean  the 
processing  of  multiple  sentences  or  paragraphs  to  extract  critical  informa¬ 
tion.  This  specialized  extraction  of  information  is  very  useful  in 
literature  analysis,  such  as  the  assimilation  of  information  on  optical 
computing  from  all  of  the  journals  of  the  IEEE  and  the  AIP.  Mechanical 
translation  of  texts,  say  from  English  Into  Spanish,  is  another  application 
of  text  processing.  Finally,  the  generation  of  natural  language  output,  as 
in  the  explanation  facilities  of  experc  systems,  is  another  critical 
application  of  NLU.  The  reduction  of  reasoning  processes  to  a  series  of 
simple  sentences,  however,  is  a  staggering  challenge,  even  for  humans. 
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There  are  several  elements  to  any  natural  language  processing  system. 
The  input  is  typically  derived  from  a  keyboard,  although  pointing  devices 
or  microphones  are  also  sometimes  used.  In  this  case,  however,  the  micro¬ 
phone  is  not  responding  to  words  or  sentences,  but  is  used  instead  to  input 
letters  in  conjunction  with  a  menu-driven  software  routine. 

Following  input,  the  NLU  system  decomposes  or  parses  the  sentence  to 
identify  word  relationships  and  dependencies.  The  parser  relies  on 
knowledge  about  the  grammar  of  the  language  to  identify  the  subject  of  the 
phrase  and  relate  nouns  and  verbs  to  their  modifiers.  For  example,  the 
well  known  phrase  from  the  journal  Applied  Optics,  "Optics  is  light  work," 
can  have  a  couple  of  different  parsings,  examples  of  which  are  shown  in 
Figure  21. 

Parsing  decomposes  the  input  phrase,  identifying  the  relationships 
between  the  words  and  storing  them  symbolically.  Following  the  parsing,  a 
semantic  interpreter  takes  this  information,  and,  using  information  from 
the  knowledge  base,  attaches  meaning  to  each  of  the  words  in  the  phrase. 
This  is  accomplished  either  by  lookup,  or  by  conversion  to  an  intermediate 
format  known  as  the  "meaning  representation  language."  This  language 
preserves  the  meaning  and  symbolic  relationships  of  the  words,  but  is 
designed  to  have  a  more  direct  mapping  onto  the  vocabulary  of  the  system 
than  a  random  input  may  havo.  From  the  above  example,  the  system  may 
determine  that  either  of  the  following  are  true: 

Optics  *  Not_Work  (N) 

Optics  -  Easy  (Adj)  +  Work  (N) 

Optics  *  Work  (N)  on  Light  (N). 

The  relative  merits  of  each  of  these  interpretations  is  determined  by 
higher  level  processing.  In  this  case,  the  representation  is  then  passed 
to  the  domain  and  discourse  processors,  .which  use  pragmatic  knowledge  to 
generate  hypotheses  about  the  meaning  of  the  input.  These  hypotheses  are 
either  verified  by  comparison  with  the  knowledge  base,  or  lead  to 
additional  semantic,  domain,  and  contextual  processing. 
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In  the  above  example,  a  bottom-up  paradigm  was  used  to  explain  the 
knowledge  processing,  just  as  we  applied  it  in  the  cases  of  speech 
understanding  and  vision.  The  bottom-up  approach,  shown  graphically  in 
Figure  22,  can  be  used  to  move  the  processing  up  from  the  parser  to  the 
semantic  interpreter,  domain  and  discourse  processor,  response  planner,  and 
data  and  knowledge  base  translators.  However,  other  approaches  are  equally 
plausible.  For  example,  a  system  architecture  based  on  the  blackboard 
model  (Figure  23)  may  be  useful  to  combine  partial  understandings  by  the 
processors  and  achieve  full  understanding  more  rapidly  and  in  parallel. 
For  example,  a  parser  may  find  two  different  parsings  of  the  sentence 
"Optics  is  light  work,"  but  other  processors  will  have  to  determine  the 
most  likely  interpretation  by  using  other  knowledge. 

Unfortunately ,  natural  language  processing  is  at  present  unable  to 
unambiguously  determine  the  meaning  of  input  sentences,  or  use  contextual 
or  pragmatic  knowledge  effectively.  Each  phase  of  the  processing  is  a 
computational  bottleneck  in  its  own  right,  particularly  when  dealing  with 
input  information  which  is  flawed  (e.g.,  wrong  punctuation),  or  has  errors 
(e.g.,  misspellings).  In  the  understanding  of  raw  text,  natural  language 
systems  need  to  be  more  broadly  applicable,  more  robust.  They  are 
currently  slow  and  limited  to  understanding  text  only  In  very  narrow  areas, 
with  a  low  accuracy  of  interpretation.  As  was  the  case  with  speech  and 
vision,  there  has  been  a  tradeoff  between  generality  and  performance  in 
natural  language  processing  systems. 

An  overall  goal  of  natural  language  research  is  the  development  of  a 
sufficiently  robust  system  (necessarily  domain  portable)  which  can  achieve 
high  levels  of  accuracy  in  interpretation.  As  before,  the  desire  to 
optimize  performance  has  initiated  the  development  of  parallel  algorithm 
research,  with  an  eye  towards  achieving  optical  or  multiprocessor  implemen¬ 
tations.  But,  as  in  the  case  of  vision,  our  understanding  of  parallel 
tasking  in  not  sufficiently  developed  to  understand  the  implications  of 
this  research. 

From  the  preceding  discussion,  we  can  begin  to  see  how  a  multi¬ 
processor  or  optical  processor  could  be  used  to  achieve  understanding  of 
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Figure  22.  Elements  of  a  Natural  Language  System 
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23.  Natural  Language,  a  Blackboard  Model 
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natural  language  in  real  time.  A  parser  itself  could  be  parallelized, 
since  many  different  morphemes,  lexical  variations,  and  syntactical 
structures  can  be  analyzed  simultaneously.  A  blackboard  approach  to  the 
main  processors  would  be  immediately  parallel izable. 

The  final  point  to  be  made  is  that,  in  describing  natural  language 
processing,  we  are  in  fact  discussing  a  behavior  of  computer  systems.  We 
have  crossed  into  the  area  where  we  are  looking  at  systems  that  can 
interpret  and  reason  about  inputs,  hopefully  even  learn  from  their 
mistakes.  The  issue  of  interpretation  expands  into  the  problem  of  where 
does  natural  language  processing  end  and  where  does  expert  system 
processing  begin?  Knowledge  representation  becomes  more  and  more  common  to 
both,  since  stored  knowledge  will  have  to  be  used  in  order  to  understand 
new  natural  language  input.  General  knowledge  about  the  world,  i.e., 
common  sense,  as  well  as  specialized  knowledge  about  a  problem  domain  must 
be  used  to  determine  if  the  new  input  is  literally  plausible  and 
corresponds  to  physical  reality,  or  if  it  is  erroneous  as  opposed  to 
metaphorical,  humorous,  or  sarcastic.  These  questions  will  surface  again 
in  the  next  section,  where  we  will  look  at  the  discipline  known  as  expert 
systems. 

E.  EXPERT  SYSTEMS 

The  final  capability  of  symbolic  computing  that  we  will  discuss  is 
expert  systems.  These  systems  are  characterized  by  their  almost  total 
reliance  upon  manipulations  of  symbolic  information,  in  marked  contrast  to 
the  systems  presented  previously.  In  speech,  vision,  and  natural  language 
understanding,  the  emphasis  was  on  some  form  of  signal -to-symbol 
transformation ,  coupled  with  the  use  of  high-level  knowledge  ,  These  are 
disciplines  which  are  dependent  upon  knowledge  retrieval  and  reasoning 
processes.  Expert  systems,  on  the  other  hand,  involve  the  process  of 
knowledge  acquisition  in  addition  to  the  retrieval  and  reasoning  process. 
Advances  in  the  other  functional  capabilities  would  be  of  great  benefit  to 
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expert  systems  technology,  since  it  draws  heavily  on  the  interfaces 
provided  by  these  disciplines. 

But  what  is  an  expert  system?  In  its  broadest  sense,  an  expert  system 
is  a  machine  which  mimics  or  emulates  the  thought  and  reasoning  processes 
of  a  human  expert.  It  seeks  to  utilize  the  solution  techniques  utilized  by 
the  human  expert  to  solve  a  particular  problem.  These  techniques  are,  in 
many  cases,  just  the  rules  of  thumb  or  heuristics  described  in  Section  II. 
The  way  experts  look  at  a  particular  problem,  the  information  they  look  at, 
the  data  they  require,  their  knowledge  about  the  knowledge  they  possess 
(what  we  term  metaknowledge),  what  they  ignore  -  these  are  the  elements  we 
call  expertise.  Expert  systems  attempt  to  capture  this  expertise,  and 
apply  it  to  a  particular  problem  area  or  domain.  The  price  to  be  paid,  as 
we  have  seen  in  earlier  sections,  is  one  of  generality;  to  date,  expert 
systems  have  only  been  able  to  function  in  very  specialized,  narrow  areas 
of  expertise.  Some  of  these  successes  were  cited  earlier  in  this  section: 
Rl,8  the  system  that  configures  VAX  and  POP  series  minicomputers;  DENDRAL,7 
the  system  developed  to  interpret  spectroscopic  data;  and  MYCIN,9  the 
system  whicn  aids  physicians  in  making  diagnoses  in  internal  medicine. 

In  each  of  these  systems,  knowledge  was  placed  into  the  machine  which 
enabled  it  to  "understand"  the  problem,  in  much  the  same  way  as  was  done 
for  speech,  vision,  and  natural  language  systems.  It  was  then  possible  for 
the  machine  to  function  as  a  decision  aid  to  the  user.  The  system  used 
knowledge  retrieval  and  reasoning  to  be  expert,  generating  inferences  about 
the  problem  based  on  data  supplied  by  the  user.  The  power  of  these  expert 
systems  is  the  amount  of  symbolic  information  which  they  can  store  in  their 
knowledge  base,  and  their  ability  to  rapidly  process  it. 

There  are  three  components  to  any  expert  system  (see  Figure  24)  -  the 
knowledge  base,  the  inference  engine,  and  the  explanation  facility.  This 
expertise  is  stored  in  the  knowledge  base,  using  one  or  more  of  the  repre¬ 
sentation  types  described  in  Section  II. B  The  knowledge  base  includes  the 
heuristics  as  well  as  any  elements  of  metaknowledge  required  for  the  task. 
The  inference  engine  gives  the  expert  system  its  reasoning  capability, 
allowing  the  system  to  combine  rules  or  frames  to  generate  a  conclusion. 
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This  usually  includes  backward  chaining,  or  goal -di rected  reasoning,  as 
well  as  forward  chaining  and  expectation-driven  reasoning.  The  algorithms 
that  underlie  these  reasoning  processes  are  very  often  matching  procedures, 
comparing  the  truth  of  one  statement  relative  to  that  of  another.  As  an 

example,  the  Rete  algorithm,  used  in  the  popular  OPS-based  expert  systems 

(such  as  the  R1  system  cited  above),  matches  antecedents  of  production 
rules  to  the  global  memory  in  conducting  a  backward  chaining  operation. 

The  explanation  facility  is  what  allows  the  user  to  understand  the 
machine's  solution  strategy,  to  determine  why  particular  conclusions  were 
reached.  Very  often,  the  explanation  facility  is  a  modified  natural 

language  interface,  allowing  the  user  to  ask  the  machine: 

"How  did  you  arrive  at  this  conclusion?" 

and  receive  an  answer  which  shows  the  evolution  of  the  machine's  reasoning 
process.  For  a  rule-based  system,  an  appropriate  response  to  the  above 
question  could  be  a  list  of  rules  that  were  "fired,"  much  like  the  trace  of 
a  FORTRAN  program.  As  the  size  of  the  knowledge  base  increases,  and  the 

expected  number  of  rule  firings  scales  proportionately,  this  explanation 
technique  becomes  insufficient.  What  is  required  is  a  system  that  can 
summarize  the  reasoning  process  employed  by  the  system.  Such  an  interface 
could  take  advantage  of  research  in  natural  language  understanding,  seeking 
to  allow  those  systems  to  cooperate  with  the  expert  system,  possibly  in  a 
blackboard  architecture. 

The  development  of  a  "classical"  expert  system  typically  requires  a 
team  of  three  people  -  the  expert,  the  knowledge  engineer,  and  a  symbolic 
programmer.  As  we  stated  before,  the  expert  supplies  the  knowledge  and 
expertise  to  the  system,  which  is  stored  in  the  systems  knowledge  base. 
The  knowledge  engineer  has  the  task  of  acquiring  the  knowledge  from  the 
expert,  typically  via  interviews  and  by  presenting  the  expert  with 
simulations  of  the  problem.  This  i-s  programmed  knowledge  acquisition,  and 
is  independent  of  any  knowledge  obtained  from  external  sensors  or  learned 
by  the  system  itself.  The  symbolic  programmer  takes  the  input  from  the 
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knowledge  engineer  and  literally  programs  the  knowledge  into  the  system. 
Since  this  is  a  labor  intensive  problem,  early  expert  systems,  such  as  the 
ones  cited  earlier,  required  many  man-years  of  effort  to  develop. 

This  division  of  labor  has  been  eased  greatly  by  the  advent  of 
advanced  expert  system  building  environments,  and  now,  the  knowledge 
engineer  and  the  symbolic  programmer  are  very  often  the  same  person.  These 
building  environments  are  "s  ell  programs,"  containing  all  of  the  tools 
that  a  programmer  needs  to  develop  an  expert  system.  In  such  a  tool,  an 
organized  knowledge  base  is  supplied;  but  it  is  devoid  of  any  knowledge. 
The  inferencing  .ability  is  also  supplied,  allowing  conclusions  to  be 
generated  once  thu  knowledge  base  is  "filled."  Drawing  an  analogy  with 
spreadsheet  programs  for  microcomputers,  these  environments  provide  the 
equivalent  of  an  empty  spreadsheet.  The  programmer  has  the  ability  to 
input  knowledge  into  the  knowledge  base,  much  the  same  way  that  an 
accountant  can  input  figures  into  the  spreadsheet,  and  tailor  the  program 
to  meet  his  or  her  needs.  Finally,  the  ability  to  combine  heuristics  to 
reach  conclusions  is  analogous  to  the  combination  of  spreadsheet  cells  to 
form  a  new  entry.  Just  as  the  spreadsheet  revolutionized  the  use  of 
microcomputers ,  these  building  tools  have  decreased  the  development  time 
associated  with  expert  systems  from  several  man-years  to  several  man- 
months,  depending  on  the  level  of  difficulty  of  the  problem. 

The  knowledge  base  of  an  expert  system  is  its  power,  since  it  has  been 
found  that  specialized  knowledge  is  an  essential  adjunct  to  logic.  The 
arduous  process  of  incorporating  that  knowledge  into  the  machine  is  a  major 
limiting  factor,  even  in  narrow  domains.  Interactive  knowledge  acquisition 
tools,  such  as  TEIRESIAS,^  have  proven  their  usefulness  in  helping  the 
domain  expert  express  knowledge  in  forms  compatible  with  the  knowledge 
base.  More  sophisticated  systems  can  be  tuilt  to  infer  rules  themselves 
from  presented  data,  as  was  done  with  MetaDendral  Even  better,  we  can 
build  systems  which  can  learn  to  guide  their  own  search  strategies,  i.e., 
learn  heuristics,  as  was  done  with  ACT??  and  EURISKO.2*  But  the  principals 
of  learning  and  adaptation  arc  still  poorly  understood.  Consequently,  most 
e/pert  systems  are  not  currently  learning  systems.  Rather,  they  use  the 
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application  of  prestored  knowledge  instead  of  learning  from  problem 
solving,  from  failures,  and  from  correcting  errors.  The  goal  of  developing 
systems  that  learn,  in  the  form  discussed  in  Section  II,  could  greatly 
decrease  the  time  and  effort  spent  in  the  development  of  knowledge  based 
systems. 

An  advanced  feature  of  intelligent  systems,  and  expert  systems  in 
particular,  is  their  ability  to  pursue  multiple  lines  of  reasoning  in 
generating  a  conclusion  or  decision.  This  has  the  advantage  of  being  able 
to  prune  or  reduce  the  potential  solution  space  by  applications  of  either 
goal  directed  (backward  chained),  model  directed,  or  forward  chained 
reasoning.  This  is  a  very  powerful  technique,  which,  although  still  in  its 
Infancy,  will  allow  a  program  to  be  approached  from  multiple  angles  until  a 
solution  is  generated.  But  any  opportunity  for  conducting  simultaneous, 
multiple  viewpoint  reasoning  Is  heavily  dependent  upon  developing  the 
appropriate  parallel  computational  structures.  This  is  due  to  the 
increased  amount  of  knowledge  based  interaction  in  such  a  system,  which 
would  only  worsen  an  existing  bottleneck  in  uni  processor-based  AI  systems. 
Unfortunately,  as  we  will  see  in  the  next  section,  our  understanding  of 
parallelism  In  computations  Is  rather  limited.  But  the  potential  benefit 
of  parallelism  in  symbolic  computation  is  one  factor  which  makes  optical 
computing  particularly  attractive. 

Having  clarified  what  an  expert  system  is,  we  can  now  investigate  what 
they  can  do.  Hayes-Roth,  Waterman,  and  Lenat,  in  their  guide  to  expert 
systems, identified  ten  areas  of  application  for  these  systems.  These 
ten,  shown  in  Figure  25,  suggest  that  there  are  a  large  number  of  potential 
applications  of  expert  systems,  for  everything  from  decision  aids,  to  the 
construction  of  a  laser  (within  given  constraints),  to  isolating  the  source 
of  a  failure  in  an  optical  system. 

We  would  like  to  focus  on  the  last,  of  the  above  instances  in  a 
simplistic  example  of  application  (3).  We  have  taken  some  liberties  in 
developing  the  structure  of  this  example,  and  we  realize  that  it  deviates 
somewhat  from  standard  experimental  practice.  Nevertheless,  we  feel  it 


73 


THE  BDM  CORPORATION 


$ 


M 


$ 


m 


7 

>■ 


■>: 

V, 


i 


£ 


7J 


4/5 

c 

_o 

4— * 

03 

3 

■—4 

•  Ml 

c/5 

c 

<u 

> 


4/5 

JD 

X 

C3 

> 

Lm 

<L) 

C/5 

X 

O 


.  -  C 

ctJ  •  — 


00  E 
o 


4/5 

<D 

O 

c 

0) 

.3 

CT 


C3 

*U 

Uh 

O 

C/5 

c 

0)  <u 

«/5  C/5 

<4-m  g 

o  o 

o 

g  ^ 
.2  o 

a  c 

«  .2 

a  .2 

5  •g 


c /5 


03 

u. 

w 

4/5 

c 
c 

'J 

c 

_  o 

<4*-.  > 

oo  ‘5b 

C  _ 

o  .£* 

X 

o  w 

e 

a  * 

33  c/5 

2  o 

£ 

Cm  X 

o  o 

C/5  O— i 

X  © 

2  c 

3 

OX) 

a  C/5 

qj 


A  Ph  Q  Q 


•  *> 

C/5 

a) 


X) 

03 

u< 

<D 

C 


C 

n2 

a. 
o 

w 

C/5 

C 

_o 

cS 
> 

1m 

<L 

4/5 

X 

o 

OX) 

.c 
*c 

c3 

a* 

E 
o 
o 

u. 

_  o 
2  oo 

Cu  f— • 

00  -C 
c  o 

5  -p 
c  g 

C3  O 
CL  S 


c 

<L 

T3 

X3 

3 

w 

4) 

C/5 

4/T 

X 

oo 

c 

*u 

c 

#o 

CJ 

C/5 

o 

c 

a 

1) 

U 

CL 

C3 

TJ 

<L> 

E 

1) 

IS 

Im 

u. 

£ 

0) 

w 

T5 

C 

1— 

O 

'5 

C3 

«MM 

r* 

00 

C/5 

£ 

c 

2 

a 

T3 

03 

’  V5 

O 

<u 

o 

c 

£ 

oo 

2 

<D 

c 

1— 

C3 

•s 

00 

c 

CL 

C /5 

U 

03 

T3 

C 

C3 

TJ 

X 

U. 

o 

<4-1 

O 

3 

o 

f— 1 

C/5 

/I 

C 

• 

V-4 

4/5 

c 

o 


%  -2 

_  3 

TJ  o 
C  <u 

03  X 
00  « 
.£  S 

oo  ^ 

O0  .2 
5  03 

X  CL 
<u  <U 

Q  OC 


>-> 

X3 

<L> 

E 

o 

Cm 


C/5 

c  x 

£  ° 

<D  *2 

c/5  -r; 

4/5 

4*m  {£ 

O  O  U* 
rz  V.  o 


u 

o 

U 


o 

E 
— » 

C/5 


> 

C3 

X 

a> 

x 


0  * 

.5 


I 


H(SfO^lO\Oh00 


79 


Figure  25.  Applications  of  Expert  Systems 


Prototype  Otpical  System  Used  in  Expert  System  Example 
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will  be  useful  in  conveying  the  principles  of  expert  systems,  as  will  also 
serve  as  a  summary  of  the  ideas  discussed  in  previous  sections. 

For  the  purposes  of  this  example,  let  us  consider  a  system  which 

I 

|  consists  of  a  laser,  a  narrow  spectrum  bandpass  filter,  a  nonlinear  optical 

!  device,  a  detector  and  a  couple  of  lenses  in  the  configuration. 

|  In  this  system,  the  laser  emits  light  into  the  first  lens  and 

I  subsequently  into  the  filter,  where  the  beam  is  changed  or  modified  in  some 

;  way.  In  this  example,  we  will  assume  that  the  filter  only  passes  light 

corresponding  to  the  exact  wavelength  of  the  laser,  so  that  any  variations 

In  the  laser's  output  spectrum  will  lead  to  a  decrease  in  light  beyond  the 

;  filter.  This  "filtered"  beam  is  then  transmitted  through  some  nonlinear 

I  optical  device,  and  the  output  of  the  device  falls  on  the  detector.  Using 

I  a  series  of  frames  to  represent  this  system,  at  the  most  simplistic  level 

,  we  have  the  following  elements  in  the  knowledge  base  of  the  system: 

>  We  have  represented  the  optical  path  of  the  system  by  the  slots  From: 

\  and  To:,  reminding  us  of  a  semantic  net  relationship  between  the  various 

|  frames.  The  changes  to  the  light  as  It  moves  through  the  system  are  repre- 

\  sented  by  the  variations  in  the  slot  Output:.  Finally,  the  attribute 

\  Input:  stores  the  knowledge  that  laser  light  moves  through  the  system.  Let 

|  us  now  suppose  that  the  detector's  output  drops  to  zero  ((=  Current  NIL), 

|  in  LISP  code),  indicating  a  problem,  and  the  system  needs  to  assist  the 

|l  user  In  determining  the  source  of  the  problem.  In  this  context,  our  simple 

expert  system  can  function  as  a  diagnosis  aid. 

In  our  example,  the  system  could  proceed  in  one  of  two  ways,  either 

working  backward  from  the  detector  or  forward  from  the  laser.  In  the 

i 

■  former  case,  the  system  could  hypothesize  that  there  Is  no  output  because 

l  there  is  no  Input;  In  other  words,  there  is  no  laser  light  reaching  the 

|  detector.  For  this  to  be  true,  one  of  the  following  must  also  be  true: 

*  the  detector  is  faulty,  the  nonlinear  device  is  faulty,  or  a  component 

* 


i 


i 
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earlier  in  the  optical  path  is  the  problem.  Here  is  where  some  expertise 
can  enter  the  problem,  since  the  system  may  know  that: 


«» 

.v 


S 


\> 

sa 


$ 

fe 


Rule.l:  If  the  detector  output  drops  to  null,  then  the  detector  is 
not  at  fault. 

Rule. 2:  If  there  is  a  nonlinear  optical  device,  then  the  outputs  of 
the  device  are  sensitive  to  alignment. 

Applying  this  expertise  to  the  problem,  the  system  can  eliminates  the 
detector  and  other  components  as  sources  of  the  problem,  and  hypothesizes 
that  the  nonlinear  device  is  at  fault.  By  interacting  with  the  user  to 
obtain  the  additional  required  information,  the  system  can  resolve  the 
truth  of  the  initial  hypotheses.  This  interaction  with  the  user  may  take 
the  form  of: 


0 


System:  "Is  Device  input  still  equal  to  Laser.lt? 


>;  User:  "Yes." 

»\ 

r  With  this  additional  information,  the  system  then  concludes  that  the  problem 

$  is  In  fact  with  the  device. 

In  the  other  paradigm,  the  system  could  assume  that  the  laser  is 
faulty,  and  work  its  way  out  to  the  detector  by  way  of: 


V 


If  no  laser.lt  from  the  Laser,  then  no  laser.lt  into  Lens.l;  Lens.l: 
Input  =  Nul 1  ; 

If  Input  3  Null,  then  Output  3  Null; 

If  no  laser.lt  from  Lens.l,  then  no  laser.lt  into  Filter;  etc... 


r 

if* 
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finally  concluding: 

If  no  laser. It  from  the  Laser,  then  no  1aser.lt  into  Detector; 

If  Detector  Input  *  Null,  then  Detector  Output  =  Null. 

Having  generated  the  hypothesis  that  the  laser  is  defective,  it  could  then 
verify  the  hypothesis  by  asking  the  user: 

System:  Is  the  output  of  the  Laser  still  equal  to  Laser.lt? 

(Does  Laser:  Output  =  Laser.lt?) 

If  the  answer  Is  positive,  some  other  component  in  the  optical  path  is 
defective.  The  system  can  then  hypothesize  that  another  component  is 
faulty,  and  iterate  on  the  above  process. 

The  above  example  is  typical  of  expert  system  operation.  The  system 
reaches  conclusions  based  on  the  truth  of  a  series  of  knowledge  base 
elements  at  any  particular  point  in  time.  Hypotheses  are  generated  and 
validated  through  additional  interactions  with  the  user,  who  supplies  the 
necessary  Information  about  the  state  of  the  system.  In  this  manner, 
expert  systems  can  function  as  a  diagnostic  aid  to  a  variety  of  users. 

The  example  may  also  allow  the  reader  to  appreciate  the  difficulties 
associated  with  Incorporating  knowledge  and  expertise  in  AI  systems.  This 
is  another  Instance  of  our  recurring  theme  of  the  tradeoff  between 
generality  and  performance.  One  of  the  limitations  of  expert  systems  is 
the  narrowness  of  the  domain  of  expertise  incorporated  into  a  system.  Each 
system  Is  a  relatively  Isolated  project,  and,  as  a  result,  the  techniques 
developed  to  solve  the  particular  problem  are  not  applicable  to  all  expert 
systems.  And  Increasing  the  cize  of  the  knowledge  base,  equivalent  to 
expanding  the  domain  of  expertise,  just  leads  to  a  combinatorial  explosion 
of  possible  inferencing  and  machine  states.  There  are  also  hardware 
limitations,  such  as  the  size  of  the  working  memory  in  the  computer,  which 
can  only  store  so  much  knowledge  at  any  point  in  time. 
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The  reader  may  ask  why  increasing  the  size  of  working  memory,  thus 
enlarging  the  active  knowledge  base  at  any  one  time,  will  not  at  least 
partially  alleviate  the  narrowness  problem.  It  does,  but  we  are  still 
faced  with  the  difficulty  of  organizing  and  managing  the  knowledge,  and 
then  channeling  it  through  in  serial  fashion  to  the  processor.  This  is 
complicated  by  the  difficulties  In  representing  certain  types  of  knowledge, 
such  as  time-varying  data,  data  with  certainty  which  varies  over  time,  and 
knowledge  about  processes  and  causality.  The  representations  discussed  in 
Section  II. B  are  inadequate  for  many  tasks  because  they  are  unable  to 
appropriately  store  the  time  variations  or  statistical  uncertainties 
associated  with  that  knowledge.  Going  back  to  our  example  on  isolating  a 
failure  in  an  optical  system,  use  oe  the  words  usually  and  typically  in  the 
rules  imply  an  uncertainty  In  the  knowledge: 

Rule.l:  If  the  detector  output  drops  to  null,  then  the  detector  is 
usually  not  at  fault. 

Rule. 2:  If  there  is  a  nonlinear  optical  device,  then  the  outputs  of 
the  device  are  typically  sensitive  to  alignment. 


k' 

JV 


I* 


At  present,  our  abilities  to  represent  these  types  of  knowledge  limit  the 
breadth  and  robustness  of  most  knowledge-based  systems. 

As  an  example,  consider  the  problems  associated  with  the 
representation  of  objects  in  space  and  the  spatial  and  temporal 
relationships  among  them.  Such  problems,  which  are  commonplace  in  vision 
research,  also  arise  in  representing  relationships  in  expert  systems.  How 
will  a  computer  "understand"  that  both  the  light  which  is  incident  on  a 
lens  and  the  light  that  emerges  from  the  other  side  are  really  part  of  the 
same  beam?  Other  types  of  knowledge  may  best  be  stored  non-verbal ly,  such 
as  in  visual  images.  Afterall,  a  picture  cr  graphic  may  be  worth  many  line 
of  computer  code.  But  the  appropriate  way  to  store  graphical  knowledge  so 
that  it  can  be  updated,  retrieved  and  used  in  making  inferences  is  an 
unsolved  problem. 


K.* 
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Even  if  we  could  expand  the  domain  of  expertise,  computational 
bottlenecks  exist  in  the  knowledge  processing  which  limit  the  performance 
of  most  expert  systems  to  between  10-  and  1000-rule  inferences  per  second 
(RIPS).  For  purposes  of  comparison,  this  corresponds  roughly  to 
throughputs  of  1  to  100  MegaFLOPS  for  numerical  computations.  This  is  not 
purely  a  hardware  bottleneck,  since  much  of  the  computational  load  rests  in 
how  the  software  structures  the  knowledge  retrieval  and  reasoning 
processes.  As  an  example,  one  goal  of  expert  systems  research  is  to 

effectively  modularize  the  way  knowledge  is  stored  and  manipulated.  At  the 

i 

present  time,  there  is  virtually  no  separation  or  modularity  between  the 
various  components  in  an  expert  system  -  the  explanation  facilities  and 
user  interfaces  share  the  same  memory  with  the  knowledge  base  and  the 
inference  engine.  Organizing,  tracking,  and  processing  all  of  this 

knowledge  in  uniprocessors  effectively  limit  the  rate  at  which  symbolic 
computations  can  be  performed.  This  is  an  example  of  the  famous  "Von- 
Neumann  bottleneck,"  which  will  be  discussed  in  the  next  section. 

These  difficulties  currently  limit  the  size  of  the  knowledge  bases  and 
hence  the  overall  robustness  of  the  system.  What  is  desired  is  shown  in 
Figure  28,  where  the  components  of  the  expert  system  have  been  modularized, 
but  are  still  interacting  at  multiple  levels  within  the  system.  Each 
component  could  possibly  function  on  an  independent  processor,  with  each 
parallel  process  communicating  by  sending  messages  to  other  processing 
elements.  This  separation  could  allow  for  expert  systems  with  larger 
domains,  more  robust  Inference  capabilities,  and  mora  diverse  applications. 

F.  CONCLUSION 

Having  generally  introduced  the  main  functional  capabilities  within 

symbolic  computing,  we  have  seen  that  each  focuses  in  on  improving  the 
machine  understanding  of  the  domain  in  question.  Understanding,  in  a 
limited  sense,  is  achieved  through  extensive  interactions  with  the 
knowledge  base  of  the  system.  iro  main  character!' sties  of  intelligent 

systems,  the  heuristic  retrieval  of  knowledge  and  the  various  reasoning 
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processes,  were  central  to  each  of  the  capabilities  discussed  in  the 
section.  The  acquisition  of  knowledge,  and  subsequent  programming  into  the 
system,  played  a  leading  role  only  in  the  expert  systems  area.  However, 
relative  to  current  system  performance,  it  is  this  organization,  manipula¬ 
tion  and  processing  of  knowledge  that  present  the  main  computational 
bottlenecks  in  AI  systems. 

Another  point  worth  noting  at  this  juncture  is  that  the  types  of 
operations  performed  in  AI  systems  are,  for  the  most  part,  very  domain 
specific.  In  many  cases,  the  actual  instructions,  memory  utilization 
techniques,  evaluation  and  matching  metrics,  and  search  processes  are 
embedded  within  the  control  sequence  for  the  intelligent  system.  It  is 
therefore  very  difficult  to  cite  common  operations,  other  than  to  show  the 
overlap  of  techniques  in  high  level  processing.  An  example  of  this  case  be 
seen  in  the  area  of  expert  system  development  tools,  where  .each  tool 
possesses  its  own  control  structure,  its  own  group  of  representations 
(production  rules,  semantic  nets,  frames,  scripts,  etc.),  its  own  reasoning 
paradigm  (forward  chaining,  backward  chaining,  expectation  driven,  or  a 
combination  of  these),  and,  as  a  result,  its  own  matching  or  evaluator 
structure.  While  any  of  these  tools  may  have  some  operations  in  common 
with  others,  typically  the  variations  are  quite  large  from  environment  to 
environment . 

An  underlying  theme  of  this  section  was  the  difficulty  associated  with 
representing  knowledge  in  AI  systems.  To  be  useful,  and  reasonably 
general,  multiple  representation  schemes  will  likely  have  to  be  used  within 
any  given  system  to  express  various  kinds  of  knowledge,  especially  the 
difficult  areas  mentioned  above.  A  problem  which  has  quantities  of  both 
declarative  and  procedural  knowledge  apparently  needs  multiple  representa¬ 
tive  schemes.  Thus,  systems  will  evolve  with  ever  larger  knowledge  bases 
and  with  multiple  types  of  representations,  in  order  to  address  more 
sophisticated  and  more  general  problems.  The  challenge  will  lie  in  the  the 
organization  and  management  of  this  knowledge  so  that  it  does  not  lead  to 
additional  bottlenecks  or  decreases  in  the  processing  rate. 
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It  has  been  estimated  that  just  to  overcome  existing  AI  system 
bottlenecks,  the  computational  throughput  rates  of  symbolic  computers  will 
have  to  be  increased  by  several  orders  of  magnitude  over  current 
capabilities. 23  while  highly  parallel  systems  are  being  studied  to  address 
this  problem,  limitations  in  our  current  understanding  will  make  it  evident 
that  parallelism  alone  cannot  achieve  the  required  speedup  in  computation 
rates.  Use  of  alternative  systems,  such  as  optical  computing  systems,  show 
great  promise  if  the  appropriate  coding  schemas  and  operations  can  be  made 
optically  compatible. 

The  previous  sections  have  highlighted  topics  of  current  interest  in 
symbolic  computation,  with  an  eye  towards  identifying  potential  applica¬ 
tions  of  optical  techniques.  Applications  such  as  vision  and  speech 
recognition  have  direct  mappings  into  image  and  signal  processing,  tasks 
which  optical  processing  and  computing  techniques  are  particularly  well 
suited  to.  The  major  challenge  for  optics  is  the  application  of  knowledge 
processing  techniques-on  top  of  this  lower  level  processing. 

The  main  point  in  the  earlier  discussions  was  that  the  operations 
where  optics  excels  are,  for  the  most  part,  the  same  types  of  operations 
utilized  in  symbolic  computations.  So,  while  the  promise  of  applying 
optical  techniques  to  AI  problems  exists,  some  strong  challenges  remain, 
such  as  optically  compatible  representations,  understanding  parallelism  (in 
optics  and  in  computations  in  general),  memory  interactions,  proper 
optical/electronic  interfaces,  and  the  ability  to  implement  the  desired 
architectures  and  required  component  technologies.  These  ideas  will  be 
explored  in  greater  detail  in  the  next  section,  which  looks  at  potential 
architectures  for  symbolic  computation. 
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CHAPTER  iy 

SYMBOLIC  COMPUTING  ARCHITECTURES 
A.  INTRODUCTION  TO  SYMBOLIC  ARCHITECTURES 


The  previous  Sections  have  laid  the  groundwork  for  the  remaining  dis¬ 
cussion  of  symbolic  computer  architectures.  Reflection  on  our  earlier 
description  of  the  fundamental  characteristics  of  symbolic  computing  should 
raise  an  awareness  to  the  importance  of  relationships  between  data 
elements  -  the  relationships  between  objects  and  attributes  in  LISP,  the 
relationships  between  nodes  of  semantic  networks,  the  relationships  within 
and  between  frames,  etc.  This  has  led  computer  scientists  to  investigate 
the  performance  improvements  that  could  be  forthcoming  by  the  use  of  com¬ 
puter  archi tectures  for  which  the  connectivity  between  processor  nodes 
could  reflect  the  relationships  fundamental  to  symbolic  computing.  Such 
thinking  derives  partially  from  experience  with  numeric  computers  in 
enhancing  performance  by  matching  architecture  to  algorithmic  structures 
and  visa  versa.  Of  course,  one  must  keep  in  mind  the  importance  of  main¬ 
taining  flexible  architectures  that  can  adapt  to  changing  relationship 
patterns;  otherwise,  the  result  will  be  special  purpose  machines  with 
1  imited  uti 1 ity. 

The  similarity  of  these  highly  connected  architectures  to  neurological 
systems  lends  credence  to  their  importance  in  symbolic  processing.  The 
brain,  with  its  relatively  slow  components  (neuron  speeds  on  the  order  of 
milliseconds),  is  able  to  process  symbolic  information  at  rates  several 
orders  of  magnitude  faster  than  conventional  (von  Neumann)  computer 
archi tectures .  Two  differences  between  the  electronic  and  biological 
systems  that  stand  out  are  the  connectivity  between  components  and  the 
intermix  between  processor  and  memory  operations.  Neurons  in  the  brain  can 
have  upwards  of  10,000  synapses  (biological  connectors)  whereas  their 
electronic  counterparts,  gates,  typically  have  only  a  few  connections  to 
other  gates.  In  the  area  of  memory  distribution,  the  von  Neumann 

archi tectures  are  characteri zed  by  a  separation  between  the  processor  and 
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memory  functions;  and  the  transier  of  information  between  the  two  often 
results  in  a  speed  bottleneck. 

A  criticism  of  von  Neumann  archi tectures  is  in  no  way  intended. 
Afterall,  when  von  Neumann  proposed  this  processor/memory  separation,  it 
was  based  on  the  limitations  imposed  bv  the  technology  of  that  time  period 
(late  1940s).  Also,  such  archi tectures  have  proven  vastly  superior  to  the 
brain  in  performing  numeric  operations.  Although  it  is  beyond  the  scope  of 
this  Chapter  to  discuss  optimal  symbol ic/numeric  systems,  future  systems 
may  someday  consist  of  cooperating  symbolic  and  conventional  numeric  pro¬ 
cessors  . 

Computers  with  high  levels  of  connectivity  and  distributed  memory  have 
been  labeled  parallel  processors  due  to  their  capability  of  supporting  con¬ 
current  operations.  Before  discussing  the  potential  for  optical  parallel 
processing,  we  would  like  to  familiarize  the  reader  with  the  principles  and 
terminology  of  parallel  processing  in  general,  and  to  discuss  broad  cate¬ 
gories  of  parallel  archi tectures . 

3.  ARCHITECTURES  FOR  PARALLEL  PROCESSING 


There  exist  numerous  taxonomies  for  classifying  parallel 

'itectures,  but  this  discussion  will  touch  on  only  those  that  are  deemed 
v  .  •/  j!  in  the  context  of  this  Chapter.  For  a  more  complete  treatment,  the 
reader  is  referred  to  the  book  by  Hwang  &  Briggs. 24  /\s  a  start,  one  can 

deal  with  temporal  parallel  versus  spatial  parallel  processors.  The  former 
most  often  takes  the  form  of  pipelining,  which  is  the  sequential  execution 
of  instructions  or  operations  such  that  initial  phases  of  follow-on 
instructions  or  operations  are  initiated  before  the  latter  phases  of  pre¬ 
vious  instructions  or  operations  are  completed.  Spatial  parallel  systems 
are  designed  to  execute  multiple  parts  of  a  problem  simultaneously.  They 
consist  of  two  or  more  processing  elements,  most  often  of  approximately 
comparable  capabilities,  such  that  each  element  contains  at  least  an 
arithmetic  logic  unit  and  a  set  of  registers.  Although  the  arrangement  and 
connectivity  of  spatial  parallel  processors  can  change,  the  diagram  in 
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Figure  29  illustrates  the  general  concept  behind  the  archi tectures .  Note 
that  interconnection  networks,  preferably  programmable  ones,  play  a  major 
role  in  parallel  processing. 

A  classification  presented  by  Seitz, 25  shown  in  . igure  30,  provides  an 
interesting  categori zation  of  parallel  systems  based  on  the  number  of  pro¬ 
cessors  and  the  relative  degree  of  processor  complexity.  Conventional 
uniprocessor  architectures  are  plotted  as  a  point  of  reference  representing 
high  complexity  in  a  single  processor.  As  one  moves  up  to  more  than  one 
processor,  the  trend  is  toward  reduced  complexity  within  each  processor,  a 
trend  that  is  driven  by  total  system  cost  on  the  one  hand  and  by  an 
escalating  overall  system  complexity  on  the  other  hand.  Microcomputer 
arrays  are  basically  a  set  of  computers  that  send  messages  to  one  another 
via  a  communication  network  as  illustrated  in  Figure  31.  Such  systems  are 
usually  loosely  coupled  (versus  tightly  coupled);  that  is,  the  individual 
computers  do  not  share  main  memory  and  I/O  devices,  although  one  computer 
can  always  draw  upon  another's  resources  through  the  communication  network. 
The  application  of  such  systems  in  symbolic  computing  will  likely  be  in 
solving  problems  that  involve  the  use  of  more  than  one  knowledge  base. 
Each  processor  can  work  on  a  given  part  of  the  problem  in  such  a  way  as  to 
minimize  the  need  for  i nterprocessor  communications. 

The  next  category,  computational  arrays,  represents  systems  whose  pro¬ 
cessing  elements  have  been  designed  for  operations  on  the  order  of 
complexity  of  multiplication  and  addition.  Systolic  arrays,  for  which  the 
processors  are  connected  in  regular  patterns  that  match  the  flow  of  data  in 
the  computation,  comprise  most  of  the  architectures  in  this  category;  how¬ 
ever,  more  general  purpose  computational  arrays  will  likely  emerge  as 
interconnection  networks  become  more  flexible.  We  will  return  to  this 
point  in  our  discussion  of  hybrid  optical -el ectroni c  systems  in 
Section  IV. E. 

The  final  categories  of  parallel  machines,  logic-enhanced  memories  and 
artificial  neural  systems,  can  be  thought  of  as  "smart  memories."  Such 
memories  can  greatly  alleviate  the  "von  Neumann  bottleneck,"  posed  by  the 
need  in  a  von  Neumann  architecture  to  constantly  move  information  to  and 
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from  memory,  by  serving  as  the  memory  for  a  host  computer;  that  is,  some  of 
the  processing  is  transferred  to  the  memory  to  avoid  the  speed  delays 
incurred  by  transfers  between  separated  memory  and  processor  units.  Each 
element  (or  node)  of  the  logic-enhanced  memories  contains  upwards  of 
several  thousand  bits  of  storage,  a  set  of  registers,  and  some  associated 
logic  capable  of  operating  on  the  storage  contents  and  of  directing  com¬ 
munication  with  the  other  elements.  For  the  case  of  artificial  neural 
systems,  the  complexity  of  the  nodes  begins  to  approach  that  of  a  switching 
element.  These  systems  are  referred  to  as  fine-grained  parallel  processors 
due  to  the  relatively  low  complexity  of  the  individual  processing  elements, 
and  they  are  always  tightly  coupled.  They  must  consist  of  at  least  several 
thousand  elements  to  achieve  a  practical  computing  power.  In  fact,  some 
fine-grained  architectures  with  as  many  as  one  million  elements  are 
currently  on  the  drawing  boards. 

The  final  entry  shown  on  Figure  30,  that  of  random  access  memories 
(RAMs),  is  given  as  a  reference  on  the  fine-grained  end  just  as  the  uni¬ 
processors  were  shewn  as  a  reference  for  nodal  complexity.  RAM's,  of 
course,  do  not  function  as  mul tiprocessors  due  to  the  absence  of 
connectivity. 

It  is  the  ti gntly-coupl ed  fine-grained  archi tectures  that  are 
attracting  the  most  interest  for  symbolic  computing  because  of  the  emphasis 
on  connectivity  over  processing  power.  For  example,  each  node  of  a 
semantic  network  could  be  mapped  to  a  separate  node  of  such  an  architec¬ 
ture,  and  the  processing  power  can  be  directed  toward  establishing  and 
identifying  the  types  of  the  links.  Some  of  the  million  processor  machines 
mentioned  above  fall  into  the  category  of  logic-enhanced  memories,  and  are 
being  considered  for  handling  semantic  networks  consisting  of  a  few  hundred 
thousand  links,  and  for  supporting  LISP  with  a  few  hundred  thousand  cons 
(connection)  cells. 

One  ether  popular  classification  of  parallel  systems  deals  with 
Single-Instruction-Mul tipi e-Data  (SIMD)  versus  Mul ti pi e- Instruction- 
Mul ti ple-Data  (MIMD ) .  There  are  two  other  categories  -  SISD  and  MISu  - 
whose  definitions  should  be  obvious.  SISD  represents  most  of  the  serial 
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archi tectures  in  existence,  and  MISD  represents  a  concept  which  has 
receiveJ  very  little  attention  due  tu  its  questionable  practicality;  there¬ 
fore,  only  SIMQ  and  MIMD  are  mentioned  in  the  context  of  parallel 
processing.  MIN'D  represents  full  parallelism  and  consequently  is  the  most 
complex  of  the  four  categories,  requiring  individual  control  units  for  each 
of  the  processors  and  the  ability  to  efficiently  identify  and  allocate 
subsets  of  the  problem  at  hand  among  the  individual  processors.  The  inter¬ 
connection  networks  permitting  interaction  between  the  units  di fferentiate 
MIMD  systems  from  systems  in  which  a  problem  is  divided  into  operations 
performable  on  a  multitude  of  SISD  machines.  Investigations  into  MIMD 
archi tectures  have  mostly  been  limited  to  loosely  coupled  systems, 
primarily  due  to  limited  knowledge  of  parallel  processing. 

The  most  significant  gains  in  the  near  future  in  understand"  rg 
parallel  operations  will  likely  come  from  implementing  SIMD  archi tectures , 
and  therefore  they  have  accounted  for  most  of  the  current  activity  in 
parallel  archi tectures .  SIMD  does  not  necessarily  mean  that  all  processors 
are  executing  the  same  instruction  set  but  only  that  they  are  being 
presented  the  same  instruction  sequence,  and  each  processor  can  be  rendered 
operable  or  non-operable  by  the  control  unit.  For  the  symbolic  operations 
best  suited  to  large  fine-grained  machines,  SIMD  may  make  more  sense  than 
MIMD  due  to  the  large  overhead  incurred  by  either  storing  entire  instruc¬ 
tion  sets  at  each  processor  or  routing  instructions  through  the  inter- 
prccessor  network. 

No  matter  what  category  given  archi tectures  fall  into,  they  all  share 
one  overwhelming  problem  -  the  need  for  interconnection  networks  that  can 
efficiently  handle  message  transfers  around  the  system.  The  performance  of 
the  message-transfer  network  is  a  prime  factor  in  determining  overall 
system  performance.  Figure  29  identified  the  three  basic  functional  areas 
in  which  the  networks  are  important  ( processor/memory ,  processor/processor, 
and  processor/IO) ,  and  although  the  performance  parameters  vary  from  one  to 
the  other  depending  on  the  limitations  of  the  components  being  inter¬ 
connected,  the  existing  archi tectures  are  applicable  to  any  of  the  areas, 
and  the  choices  in  the  past  have  been  made  more  on  the  grounds  of 
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technology  limitations  and  cost  rather  than  functional  differences.  The 
following  discussion  of  networks,  therefore,  will  bs  independent  of  the 
functional  area. 

The  thrust  in  interconnect  archi tectures  is  toward  programmable  net¬ 
works.  This  is  not  surprising  since  the  power  of  the  symbolic  architec¬ 
tures,  especially  the  fine-grained  ones,  comes  from  the  interconnects; 
hence,  increasing  the  flexibility  of  the  nets  by  making  them  programmable 
goes  a  long  way  toward  enhancing  the  overall  processing  power.  Further¬ 
more,  the  system  designer  is  usually  willing  to  trade  some  speed  for  the 
robustness  of  the  slower  programmable  nets  by  exchanging  hard-wired  inter¬ 
connects  for  switchable  ones  which  can  adapt  the  network  pattern  to  the 

data.  In  symbolic  processors,  adaptibility  is  even  more  important  than  for 
numeric  processors  because  the  topology  of  the  data  structures  is  usually 
irregular;  therefore,  the  optimum  network  topologies  cannot  be  determined 
prior  to  system  design.  In  passing,  it  should  be  mentioned  that 

programmabi 1 ity  is  also  important  from  the  fault  tolerant  standpoint  since 
it  permits  system  reconfi guration  to  bypass  faulty  processors. 

The  interconnect  networks  can  be  categorized  into  one  of  three  general 
classes:  multiplexed  buses,  multiport  components,  and  switching  networks. 

The  bus  is  the  simplest,  and  therefore  the  most  popular,  interconnect 

method  because  it  involves  the  fewest  number  of  switch  elements.  However, 
contention  for  these  switches  when  many  users  (processors)  are  involved 
limits  their  utility  mostly  to  loosely-coupled  multiprocessor  arrays  for 
which  the  contention  for  bus  resources  is  not  as  great  as  for  the  more 
tightly  coupled  systems. 

The  contention  problem  can  be  lessened  by  providing  components  (e.g., 
memories)  with  more  than  one  port  and  by  increasing  the  number  of  system 
buses  such  as  shown  in  Figure  32.  This,  of  course,  increases  the 

complexity  of  the  components. 

The  third  class  of  interconnects  uses  a  network  of  switches  that 
establish  the  communication  paths  around  the  system.  The  most  general 
network,  a  generalized  crossbar  switch,  is  capable  of  establishing 
independent  communication  links  between  all  components  connected  to  the 
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network;  that  is,  there  exists  no  sharing  of  switching  elements  between 
channels  and  therefore  no  contention  for  the  switching  resources.  Although 
such  a  network  represents  the  ultimate  in  interconnect  power,  its  imple¬ 
mentation  cost  in  electronic  hardware  has  proven  too  high  for  large  systems 
in  terms  of  cost,  power  requirements,  and  crosstalk  avoidance.  A  k  x  m 
generalized  crossbar  has  k  input  ports  for  connection  to  k  data  or  message 
sending  components  and  m  output  ports  for  connection  to  m  receiving  nodes. 
If  each  of  n  nodes  of  a  system  were  to  have  one  transmitting  port  and  one 
receiving  port  (or  instead,  one  bidirectional  port),  then  an  n  x  n  crossbar 
would  permit  the  n  nodes  to  be  fully  interconnected  and  free  of  any 
contention  for  network  resources. 

For  the  sake  of  simplicity,  a  crossbar  only  of  dimension  3x3  is 
illustrated  in  Figure  33.  Note  that  the  cost  in  terms  of  hardware  for  an 
n  x  n  crossbar  would  be  n2  switches  and  2n?  bidirectional  communication 
links.  For  large  n,  implementation  in  electronic  integrated  circuit 
technology  becomes  a  formidable  problem,  especially  the  design  of  the  ?n2 
bidirectional  links  with  adequate  bandwidth  and  an  acceptable  limitation  on 
crosstalk.  Also,  electronic  components  have  a  very  limited  fan-out 
capability;  for  example,  the  fan-out  limitation  for  an  electronic  gate  is 
approximately  ten  other  gates.  The  electronic  solution  has  been  to  fall 
back  to  multi-stage  switching  networks.  An  example  of  one  of  many  possible 
implementations  (a  baseline  network)  is  shown  in  Figure  34a,  where  each 
switching  element  is  limited  to  a  fan-in  of  2  and  a  fan-out  of  2.  The 
interconnect  structure  is  a  three-stag1-'  network  -  all  switches  in  a 
vertical  column  would  function  as  one  stage  of  the  network.  This  8x3 
interconnect  network  is  composed  of  twelve  2x2  crossbar  switches,  the 
functions  of  which  are  illustrated  in  Figure  34b.  Only  a  total  of 
48  switches  (1?  x  2  x  2)  are  needed  instead  of  the  64  needed  for  an  8  x  8 
crossbar.  But  the  multi-stage  design  leaves  the  door  open  for  contention. 
Numerous  topologies  (e.g.,  tree,  banyan,  delta,  clos,  mesh,  to  name  a  few) 
exist  for  interconnecting  smal 1 -dimensional  crossbars,  and  the  choice  is 
usually  one  between  cost  and  an  acceptable  degree  of  contention.  For 
example,  if  the  degree  of  contention  associated  with  a  five-stage  clos 
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network  is  acceptable,  a  10C0  x  1000  interconnect  network  would  require 
only  146,300  switches  rather  than  the  one  million  required  for  the  cross¬ 
bar.  A  more  thorough  discussion  of  network  topologies  is  given  by  Hwang  5 
Sriggs 

Given  that  a  single  generalized  crossbar  is  not  practical,  the  network 
designer  must  decide  what  topology  best  fits  the  data  structure  most  likely 
to  be  encountered.  As  an  example,  a  prime  topology  for  interconnecting  a 
parallel  processor  designed  for  low  level  image  processing  would  be  a  grid 
structure  enabling  nearest  neighbor  interconnects  since  the  vast  majority 
cf  the  low  level  operations  involve  adjacent  image  pixels.  But  as  one 
moves  toward  the  higher  operations,  regional  relationships  between  the  data 
become  more  important,  requ;ring  more  global  interconnect  topologies. 

C.  OPTICAL  IMPLEMENTATION  OF  MULTIPROCESSOR  ARCHITECTURES 


Any  approach  to  optical  arch i tectures  must  give  serious  consideration 
to  what  can  be  accomplished  with  existing  technologies,  namely  VLSI 
electronics.  Optical  computing  will  not  seriously  threaten  electronic 
computing  unless  it  can  offer  several  orders  of  magnitude  improvement  in 
some  critical  measurement  criterion,  such  as  the  power-speed-cost  product, 
in  a  given  oroblem  domain.  Therefore,  a  good  starting  point  in  addressing 
the  application  cf  optics  to  symbolic  computing  is  to  identify  problem 
areas  for  electronics. 

It  is  not  surprising  that  the  relative  weaknesses  and  strengths  of 
electronics  and  optics  are  traceable  in  one  way  or  another  to  the  funda¬ 
mental  physics  of  i nter-el ectron  and  inter-photon  interactions.  Relatively 
speaking,  the  interactions  between  electrons  are  strong  while  that  between 
photons  are  weak.  Hence,  electrons  are  good  for  the  switching  operations 
so  fundamental  to  computing  and  photons  are  good  for  the  inter-switch 
communications,  providing  links  which  are  free  from  detrimental  coupling 
effects  that  lead  to  crosstalk  and  ^dpacitive  loading.  Subscribing  to  such 
reasoning,  however,  is  impractical  due  to  the  quantum  losses  which 
accompany  both  the  el  ectron-to-photon  and  the  pho^r -n  .ectron 
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conversions.  There  is  research  and  development  underway  to  replace  some  of 
the  longer  interconnect  links  within  computers  with  optical  channels 
because  it  is  the  longer  interconnects  that  create  severe  power,  speed,  and 
space  problems  for  electronics. 26  But  such  a  capability  stops  far  short  of 
using  optics  to  it.  full  advantage  in  multiprocessor  architectures  appro¬ 
priate  for  symbolic  computing. 

Consider  the  el  ectronic-swi tchi ng/optical -communi  cations  position  as 
representing  one  of  the  four  corners  of  the  square  shown  in  Figure  35  for 
which  the  sides  of  the  square  represent  a  continuum  of  combinations  between 
the  extremes  of  the  corners.  The  upper  left  corner  represents  all- 
electronic  systems  while  the  bottom  right  represents  all-optical.  Since 
movement  toward  the  bottom  left  corner  is  out  of  the  question,  the  focus  Is 
along  the  upper  and  right  sides.  Upon  considering  computing  systems  for 
which  switching  is  the  predominant  function,  the  tradeoff  between  optics 
and  electronics  is  seen  to  fall  somewhere  along  the  upper  edge;  that  is, 
al 1 -electronic  switching  with  some  optical  links.  However,  symbolic  pro¬ 
cessing  places  a  strong  emphasis  on  communications  as  has  been  pointed  out 
numerous  times  earlier  in  the  Chapter.  The  de-emphasis  on  switching  (fine¬ 
grained  architectures)  and  the  emphasis  on  communications  (tightly-coupled 
systems)  leads  one  to  consider  architectures  for  which  the  communications 
is  optics  and  only  some  of  the  switching  Is  done  with  electronics.  It  Is 
this  category  of  electronic/optical  hybrid  archl tectures  that  we  believe 
will  havp  a  significant  impact  on  symbolic  computing. 

Toward  the  upper  end  of  this  continuum  would  be  those  archi tectures 
which  employ  optical  switching  in  the  performance  of  reconfiguring  the 
Interconnects  but  which  employ  electronic  switching  for  logic,  operations. 
As  mentioned  earlier,  optics  will  prove  to  be  especially  valuable  in 
providing  the  longer  (more  global)  interconnects  due  to  the  power,  speed, 
and  space  penalties  associated  with  the  longer  electronic  interconnects. 
An  example  of  an  architecture  with  such  a  designated  mixture  of  optics  and 
electronics  is  shown  in  Figure  36.  For  the  sake  of  simplicity,  the 
illustration  shows  only  two  of  many  possible  boards  and  shows  only  four 
chips  per  board.  If  this  were  a  fine-grained  processor,  each  chip  itself 
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would  contain  many  processing  elements  (PEs).  For  example,  one  such 
electronic  symbolic  computer  currently  under  development,  called  the 
Connection  Machine, 2?  is  composed  of  a  large  array  of  printed  circuit 
boards  each  of  which  contains  b  12  PEs  equally  divided  between  32  chips 
(i .e . ,  16  PEs  per  chip). 

Each  board  in  Figure  36  contains  four  optoelectronic  chips  and  one 
frequency  selective  filter  (hologram).  In  between  each  board  is  a  planar 
array  of  reconfigurabl e  diffraction  gratings  that  perform  the  majority  of 
the  switching  operations  involved  in  the  interconnection  process.  This 
particular  architecture  employs  wavelength  division  multiplexing  (WDM)  to 
direct  optical  bit  streams  to  the  appropriate  board.  The  beam  labeled  /I 
illustrates  this  operation.  The  hologram  directly  overhead  of  the  trans¬ 
mitting  chip  directs  the  beam  to  the  center  of  the  next  board  where  it  is 
superimposed  on  the  main,  beam  which  travels  to  all  of  the  system  boards. 
Upon  reaching  the  intended  board,  the  frequency  selective  filter  diffracts 
the  beam  to  a  bus-to-board  hologram  which  directs  the  beam  to  its  final 
destination.  The  intra-board  and  intra-chip  interconnects  would  be 

handled  by  the  plane  of  holograms  above  the  board  as  illustrated  by  beam 
/2.  The  logistics  of  handling  a  large  number  of  multiplexed  beams  will  not 
be  discussed  here  other  than  to  say  that  the  optical  switching  most  likely 
will  be  achieved  through  nonlinear  wave  mixing.  For  example,  four-wave 

mixing  may  be  used  to  generate  holograms^  which  can  be  rapidly  varied  to 
permit  Interconnect  reconfiguration.  The  reconfiguration  beams  shown  in 
Figure  36  would  contain  the  desired  information  for  changing  the  holo¬ 
graphic  gratings.  Note  that  some  of  the  switching  actions  of  such  an 

architecture  are  performed  by  the  multiplexing  action,  and  the  various 

holograms  act  as  passive  gratings  that  selectively  direct  the  various  wave¬ 
lengths  . 

Such  a  versatile  interconnect  scheme  based  on  directing  light  beams 

through  free  space  contrasts  with  one  of  the  most  severe  problem  areas  for 
electronic  symbolic  computing  -  that  of  implementing  reconf i gurabl e  inter¬ 
connects.  At  present,  electronics  depends  on  wires  for  all  Interconnects, 
limiting  reconf i gurabi  1  i  ty  as  well  as  fan-out  capabilities.  This  places  a 
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severe  limitation  on  the  switching  network  architectures,  which  is  compli¬ 
cated  by  interconnect  intensive  problems,  like  those  found  in  AI.  The 
spectrum  of  optical  networks,  with  their  much  greater  versatility,  will  not 
be  reviewed  here;  however,  the  interested  reader  is  referred  to  publi¬ 
cations  of  Sawchuk,  et.  al.29 
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As  one  moves  toward  the  bottom  right  corner  of  the  classification  of 
Figure  35,  the  percentage  of  optical  implementation  increases  until  an  all- 
optical  architecture  is  achieved.  An  example  of  a  fine-grained,  tightly- 
coupled  optical  computer  is  shown  in  Figure  37.  Although  no  one  has  built 
such  a  computer,  it  is  technically  possible  to  achieve  such  a  system  con¬ 
sisting  of  one  million  parallel  channels.  This  does  not  mean  that  the 

system  would  be  configured  necessarily  with  one  million  nodes  since  such  a 
configuration  implies  that  the  planar  array  of  logic  elements  (designated 
as  the  gate  array)  would  have  just  one  logic  element  per  channel.  Instead, 
several  logic  elements  would  usually  be  interconnected  via  the  interconnect 
media  to  form  a  processing  element.  For  example,  a  square  array  of  n  x  n 
logic  elements  (gates)  may  comprise  an  arithmetic  logic  unit,  several 

registers,  and  possibly  some  cache  memory.  An  example  of  this  type  of 

structure  is  shown  in  Figure  38,  where  individual  elements  in  a  2-D  SLM 
have  been  assigned  the  necessary  functions  to  comprise  a  computational 

processing  element.  Taking  an  n  of  5  (25  logic  elements/processor)  would 
lead  to  a  machine  with  40,000  nodes  -  large  enough  to  be  practical  as  a 
symbolic  computer. 

The  input  to  the  optical  computer  could  be  either  through  an  array  of 
independently  addressable  laser  diodes  or  a  two-dimensional  spatial  light 
modulator  (2D  SLM).  The  diode  array  would  be  capable  of  much  higher 
modulation  speeds,  but  would  involve  more  complex  circuitry,  especially  if 
operation  requires  uniformity  over  the  complete  array.  If  the  input 
already  exists  as  a  two-dimensional  light  pattern  such  as  might  be  output 
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from  a  vision  processor,  an  input  device  may  not  be  needed  depending  on  the 
compatibility  of  the  two  processors. 

The  logic  element  array  could  be  either  a  20  SLM  exhibiting  a  non¬ 
linear  response  or  an  array  of  optical  bistable  switches.  The  latter 
device  will  ultimately  lead  to  much  higher  switching  speeds,  but  current 
realizations  of  optical  bistable  switches  have  required  impractical  power 
levels.  Improved  nonlinear  optical  materials  are  currently  under  develop¬ 
ment  for  improved  optical  bistable  devices. 

The  interconnect  element  will  likely  employ  wave  mixing  in  a  nonlinear 
optical  medium,  similar  in  operation  to  that  described  previously  for  the 
hybrid  archi tecture .  However,  due  to  the  much  larger  number  of  channels 
that  must  be  handled,  the  switching  may  be  done  in  a  multi-stage  fashion  in 
which  multiple  parallel  planes  of  real-time  hologram  arrays  would  be 
exercised  as  illustrated  in  Figure  39.  Note  that,  for  the  sake  of  sim- 
p1icity,  all  three  interconnect  functions  (processor/processor,  processor/ 
memory,  and  processor/10 )  are  combined  into  one  block,  but  they  could  be 
implemented  by  three  independent  devices. 

The  detector  will  be  a  major  technological  challenge.  In  the  most 

general  case,  one  would  like  a  one  million  channel  device  with  each  channel 
operating  around  I  MHz  (projected  speed  for  2D  SLMs).  However,  the 

requirements  will  he  much  less  for  most  practical  processor  designs.  If 

the  problem  domain  were  to  require,  say,  100  iterations  or  more  (e.g., 
semantic  network  searches  to  depths  of  at  least  100),  an  output  would  be 
required  only  once  every  100  microseconds.  This  reduces  the  throughput 
requirements  for  the  detector  to  1010  bits  per  second  (bps),  a  number  more 
in  line  with  projections  for  GaAs  microelectronics.  Another  example  would 
be  where  each  processor  consists  of  a  block  of  n  x  n  channels  as  discussed 
above.  Assuming  an  n  equal  to  5  and  that  each  processor  has  just  one 

output  channel ,  the  throughput  requirement  for  the  detector  would  be  4.0  x 
1010  (bps).  Some  combination  of  these  two  designs  should  yield  a  detector 
requirement  that  would  be  well  within  technical  feasibility. 

The  last  major  component  of  this  opto-el ectronic  architecture  is  the 
memory.  The  goal  of  co-locating  much  of  the  memory  with  the  logic  elements 
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is  not  necessarily  transferable  to  the  optical  computer  domain  because  of 
the  greatly  reduced  communications  delays.  Thus,  Figure  37  shows  the  main 
memory  as  a  single  block,  equally  shared  by  all  of  the  processors.  Another 
important  niche  for  optics  is  multiport  memories.  In  fact,  the  use  of 
multiple  wavelengths  could  enable  the  read-out  of  any  given  memory  location 
by  a  multitude  of  channels  simultaneously,  thereby  avoiding  the  need  for 
complex  contention-resolving  circuitry.  For  example,  holographic  gratings 
could  be  used  to  demultiplex  the  superimposed  reflections  of  a  multitude  of 
wavelengths  reflected  from  a  given  spot  on  an  optical  disk,  or  a  holo¬ 
graphic  memory  element  could  be  used  that  would  spatially  separate  the 
various  read-out  wavelengths.  A  way  in  which  this  could  be  implemented  is 
shown  schematically  in  Figure  40,  where  multiple  beams  could  be  used  to 
address  an  optical  disk  simultaneously. 

Another  appealing  attribute  of  using  multiple  wavelengths  in  optical 
computing  is  that  the  switching  control  is  transferred  to  the  information 
carrying  beam  itself  rather  than  having  to  exist  as  a  separate  entity, 
adding  greatly  to  the  complexity  of  the  computer  control  operations.  This 
more  closely  parallels  the  operation  of  message  routing  systems  in  which 
initial  bits  of  the  message  bit  stream  contain  the  address  information 
which  is  used  by  each  switch  that  the  message  encounters  as  it  propagates 
through  the  network. 

The  processing  power  of  the  all-optical  architecture  could  be  enhanced 
through  the  use  of  pipelining.  This  could  be  achieved  by  replicating  the 
logic  element  array  as  shown  in  Figure  41.  Pipelining  would  be  useful  for 
mul ti -dimensional  problems  such  as  vision  processing  dealing  with  time- 
varying  three-dimensional  imagery  (e.g.,  each  plane  could  handle  a  dif¬ 
ferent  image  depth). 

By  now,  the  reader  should  have  an  appreciation  of  the  types  of 
archi tectures  required  for  symbolic  computation,  and  of  several  ways  for 
achieving  them  optically.  It  should  be  clear,  however,  that  even  these 
"al  1 -optical "  structures  are  in  some  sense  hybrid  opti cal -el ectroni c 
architectures.  In  particular,  electronics  would  be  used  for  interfaces  to 
the  user,  to  digital  controllers,  etc,  whereas  the  optics  would  be  used  to 
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expedite  the  symbolic  processing.  With  reference  to  Figure  36,  we  would 
£  expect  that  there  is  a  spectrum  of  levels  where  the  optica  I -el ectroni c 

interface  could  occur.  This  spectrum  of  architectural  possibilities, 
■\  ranging  from  the  al  1 -el  ectroni  c  systems,  to  the  hybrid  opti  cal -el  ectroni  c 

systems,  leads  to  other  possible  roles  for  the  using  optics  in  symbolic 
^  computation.  Some  of  these  possibilities  will  be  explored  in  the  following 

section. 
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E.  HYBRID  OPTICAL-ELECTRONIC  SYSTEMS 

There  are  several  levels  on  which  ontics  can  be  effectively  combined 
with  electronics.  As  shown  in  Figure  42,  there  is  a  hierarchy  of  functions 
in  hybrid  systems,  ranging  from  replacing  the  processor  for  a1  cost  all 
operations,  as  in  t.ie  all-optical  systems  of  the  previous  section,  to  the 
use  of  optics  only  as  a  peri  pheral  device,  such  as  an  optical  disk  for 

storage.  The  differences  are  in  the  degree  of  coupling  between  the  elec¬ 

tronic  processor  and  the  optical  system,  and  in  the  amount  of  computation 
performed  by  the  optics.  In  the  following  discussion,  we  are  assuming  that 
the  electronic  system  is  the  host  processor,  and  that  the  optical  system  is 
connected  to  it  via  one  of  the  system  busses. 

At  the  lower  end,  the  optical  system  would  entirely  replace  the 
electronic  system  at  the  processor  level,  and  would  therefore  perform 

almost  of  the  computation  and  would  interact  strongly  with  the  memory  of 

the  electronic  system.  Using  the  terminology  of  Section  IV. B,  such  a 

hybrid  system  would  be  said  to  be  tightly-coupled.  At  the  next  level,  we 
have  a  structure  where  the  optical  system  computes  some  of  the  primitives 
(multiplication,  addition,  subtraction,  etc.),  whi.e  the  electronics 
processes  others.  An  example  of  this  could  be  an  optical  pattern  matcher 
connected  to  the  data  bus  of  a  LISP  machine,  where  the  optical  system 

performed  all  the  matcning  operations,  leaving  the  electronics  to  compute 

other  primitives.  This  is  also  a  tightly-coupled  system,  since  the 

processors  share  physical  memory,  and  is  analogous  to  the  concept  of  an 
optical  co-processor .  The  bandwidth  of  the  interconnection  nptwork  must  be 
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very  high,  arc  roughly  equal  to  several  times  the  memory  access  and 
trans  fer  rate . 

Accelerators  re  processors  which  dramatically  increase  the  throughput 
of  a  particular  function,  such  as  an  inner  product,  a  correlation,  or  a 
rule  firing.  As  we  saw  in  Section  III,  inner  products  play  a  major  role  in 
A!  processing,  such  as  feature  comparison,  template  matching,  and  correla¬ 
tion  processing  at  the  lower  levels,  and  in  knowledge  base  searches  and 
inferencing  at  higher  levels.  Later  in  this  section,  we  will  look  at  how 
inner  products  can  be  applied  to  the  processing  of  "if...,  then..."  types 
of  rules . 

Accelerators  have  had  great  impact  in  uniprocessor  numeric  computa¬ 
tion,  all  but  eliminating  a  number  of  previously  troublesome  computational 
bottlenecks.  Surpri singly ,  they  have  not  yet  found  widespread  application 
in  symbolic  computing  or  in  multiprocessor  systems,  and  optics  could  help 
hasten  that  process.  As  in  Figure  44,  the  optical  computer  could  again  be 
connected  to  the  data  bus  of  the  system,  but  it  does  not  share  memory  with 
the  electronic  system.  This  is  an  example  of  a  more  loosely-coupled  hybrid 
system,  where  the  host  machine  and  the  accelerator  are  proximally  located 
but  not  necessarily  within  the  same  housing. 

Special  function  processors  (SFPs),  as  the  name  implies,  have  traded 
generality  for  performance,  maximizing  the  throughput  of  a  specific 
function.  Typically  they  are  very  specialized  computers  with  limited 
programmabil ity ,  limited  memory,  and  minimal  interfacing  requirements.  As 
separate  computing  units,  SFPs  are  connected  to  the  host  via  a  network,  an 
optical  fiber,  or  some  other  high-bandwidth  medium.  They  are  also  referred 
to  as  computational  arrays,  and  have  been  used  successfully  as  feature 
extractors  in  low-level  vision  and  speech,  as  display  processors  in 
computer  graphics,  and  as  array  processors  for  computing  FFTs .  The  most 
popular  SFP  is  the  systolic  array,  for  which  there  '3  a  rich  base  of  experi¬ 
ence,  and  a  number  of  optical  implementations  os  well. 

For  the  remainder  of  this  section,  we  would  like  to  focus  on  two 
examples  from  the  discussion  above,  namely  the  use  of  optical  computing  as 
an  accelerator  and  as  a  special  purpose  processor.  For  the  accelerator 
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case,  we  will  use  the  example  of  inner  product  processing  for  "if..,, 
then..."  type  rules.  As  a  special  purpose  processor,  we  will  look  at  the 
potential  role  of  systolic  array  implementations  of  semantic  net 
processi ng. 

A  central  operation  in  symbolic  computing  is  the  inner  product,  which 
is  equivalent  to  a  multiplication  of  component  elements  in  a  vector  (vector 
multiplication),  in  a  matrix  (matrix-matrix  multiplication),  or  in  a 
correlation  function.  In  earlier  sections,  we  identified  the  commonality 
of  inner  products  in  a  large  number  of  algorithms  in  the  numeric  computing 
domain.  In  one  typical  symbolic  computing  representation,  knowledge 
relations  are  expressed  in  terms  of  logical  pattern  matching,  such  as 
determining  the  agreement  of  an  antecedent  condition  (left-hand  side)  of  an 
"if  A  ,  then  B  "  relation  (see  Section  III.E).  Here  A  takes  the  form 
of  a  vector  subspace  of  a  N-dimensional  vector  space: 


A  -  data/objects  that  belong  to  class  A 
which  Is  spanned  by  some  M  vectors,  where  M  s  N: 
ai (k)  n  A  ,  k  »  1 .... ,  M 

Membership  in  this  subspace  can  be  verified  by  means  of  a  simple  functional 
operation.  For  example,  the  null  functional  of  a  subspace  is  uniquely 
represented  In  terms  of  the  vector  aA  that  is  orthogonal  to  the  subspace 
a(k).  Then,  the  Inner  product: 

aA  ,  ai(k)  *  0  for  all  a  i  ( k )  n  A 

Thus,  a  calculation,  or,  in  this  case,  a  rule  firing,  between  knowledge 
elements  in  this  representation  and  some  appropriate  functional  may  be 
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production  rule  must  be  satisfied  during  an  inference.  This  is  also  true 
for  frame-based  representations ,  since,  in  that  case,  each  knowledge 
element  is  a  2D  array  of  information  processed  in  its  entirety,  and  the 
ai  s  may  take  the  form  of  matrix  components.  For  each  rule  firing,  the 
matching  operation  can  be  computed  on  the  optical  system  very  efficiently, 
alleviating  a  serious  computational  bottleneck  in  several  AI  systems. 

Inner  products  are  not  the  only  operations  for  which  optics  may  have  a 
role.  Systolic  arrays,  of  which  there  have  been  several  optical  Implemen¬ 
tations,  7-9  have  been  shown  to  have  definite  mappings  onto  signal-flow 
graph  networks.  This  implies  that  problems  treatable  or  based  on  graph- 
theoretical  techniques  may  have  direct  mappings  onto  well-defined  systolic 
array  topologies.  In  symbolic  computing,  graph  theory  analysis  has  been 
applied  to  developing  relationships  between  definable  objects  and  their 
attributes;  this  research  has  led  to  the  semantic  net  representation. 

For  purposes  of  illustration,  the  semantic  net  can  be  viewed  as  a 
collection  of  nodes  representing  symbols,  which  are  connected  by  links 
representing  relations.  The  most  fundamental  relationship  between  symbols 
Is  the  "IsA"  link,  and  other  types  of  relations  could  be  "AtLocatlon ," 
"MemberSet,"  and  "Partof."  Such  relations  are  domain  specific,  and  depend 
upon  the  taxonomy  of  the  problem  under  Investigation.  In  this  representa¬ 
tion,  a  basic  question  common  to  symbolic  computation  systems  would  be  "Is 
A  a  B  ?."  If  we  assume  that  all  connections  between  a  general  class  of 
nodes  S  are  constituted  by  "IsA"  links,  then  this  query  is  reducible  to 
the  probl em  of: 

"For  A  a  member  of  S  ," 

"Is  B  also  a  member  of  S  ?"  and 

"Is  there  a  connectivity  between  A  and  B  ?" 

On  a  conventional  arch i tecture ,  this  query  would  involve  an  extensive 
sequential  search  over  all  memory  paths  emanating  from  A  .  However,  on  a 
systolic  system,  the  conclusion  involves  only  the  number  of  steps  between 
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A  and  B  ,  or  to  the  edge  of  the  array  (for  B  not  an  element  of  S  ); 
this  is  because  the  search  would  proceed  along  all  branches  simultaneously. 

To  map  this  question  onto  the  systolic  system,  we  can  define  each  node 
to  reside  on  a  processor  or  a  pixel,  and  the  links  to  adjacent  nodes  are 
the  existing  topological  connectivity  of  the  array.  Here,  the  array  is  a 
3-D  construct  with  the  third  dimension  being  time.  Node  A  can  be  tagged 
as  the  element  of  Interest,  and  the  resulting  search  towards  B  can  occur 
as  internodal  bit  stream  propagation  in  each  time  step.  This  is  shown 
schematically  in  Figure  43.  While  this  still  requires  some  test  for  the 
match  condition,  the  set  of  semantic  net  problems  should  have  some  direct 
mapping  onto  systolic  array  systems. 

In  summary,  optics  offers  several  unique  capabilities  for  the  generic 
architectures  discussed  above,  architectures  which  hold  great  promise  for 
symbolic  computing.  The  most  important  of  these  capabilities  are:  high 
speed  global  Interconnects,  interconnect  reconfi gurabi 1 Ity,  high  fan-out 
elements,  and  multiport  components  for  parallel  processing. 
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AI  LANGUAGES  AND  TOOLS 


A  principle  of  Artificial  Intelligence 
according  to  N.  J.  Nilssons  Onion  Model. 


Si  AGENDA 


A  part  of  3ri  expert  system  that  con¬ 
tains  a  prioritized  list,  of  knowledge 
based  rules  awaiting  execution. 


I 
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ANALOGICAL  KNOWLEDGE 


ARCHITECTURE 


Knowledge  that  is  represented  by 
analogy  in  a  computer.  Examples  of 
this  are  sound  patterns  representing 
words  in  a  natural  language  processing 
system,  or  the  representation  of  an 
image  by  a  2D  array  of  numbers  corres¬ 
ponding  to  steps  on  a  gray  scale. 

The  organization  of  the  individual 
elements  of  a  computer. 
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ASSERTION 

ATOM 


ATTRIBUTE 


fa 


A  positive  statement  or  declaration. 

A  symbol  (either  constant  or  variable) 
used  to  identify  an  object  in  a  LISP 
program. 

Part  of  the  description  of  an  object 
contained  In  a  frame.  Attributes 
normally  tell  characteristics  such  as 
color,  size,  and  value.  The  same  as  a 
slot. 


THE  BOM  CORPORATION 


BACKWARD  CHAINING 


BACKTRACKING 


BELIEF 


GLOSSARY  (CONTINUED) 

A  recursive  procedure  for  problem 
solving  in  knowledge  based  systems.  A 
progression  by  goal -driven  inference  is 
attempted  between  an  assumed  goal  state 
and  one  or  more  initial  states.  If  the 
progression  is  not  possible,  the  goal 
state  is  altered.  If  the  progression 
is  possible,  the  initial  state  or 
states  become  goal  states  and  the 
procedure  begins  again. 

A  search  procedure  in  which  guesses  are 
made  about  the  direction  to  be  taken 
through  the  solution  space.  When  the 
guesses  lead  to  an  unacceptable  result, 
the  procedure  backtracks  to  the  point 
at  which  the  incorrect  guesses  were 
made  and  begins  the  search  procedure 
again  in  alternative  directions. 

A  hypothesis  about  the  outcome  of  some 
unobservable  or  uncertain  situation. 


BLACKBOARD 


A  part  of  many  artificial  intelligence 
systems  in  which  intermediate  or 
partial  results  of  problem-solving  are 
recorded . 
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BREADTH  FIRST  SEARCH 


BOTTOM  UP 

CERTAINTY 

COMMON  SENSE  REASONING 
AND  LOGIC 

COMPUTATIONAL  LOGIC 


GLOSSARY  (CONTINUED) 

A  search  strategy  used  in  knowledge 
based  systems  where  the  possible  solu¬ 
tions  to  the  problem  are  represented  by 
a  tree  with  nodes  and  branches.  In 
breadth  first  search,  all  branches  of 
the  tree  are  examined  at  one  node  or 
level  before  moving  to  the  next  level. 
In  this  way,  searching  the  breadth  of 
the  solution  space  is  emphasized.  See 
Depth  First  Search. 

A  method  of  problem-solving  which 
progresses  from  an  initial  condition 
to  some  desired  condition.  See  Data 
directed  inference  and  forward  chaining. 

A  measure  of  the  confidence  placed  by  a 
user  on  the  validity  of  a  proposition, 
hypothesis,  or  inferential  rule. 

A  principle  of  AI  from  Nilsson's  Onion 
model  . 

Logical  reasoning  done  on  a  symbolic 
computer.  This  is  the  basis  for  non¬ 
numeric  computations  done  in  AI  systems. 
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GLOSSARY  (CONTINUED) 


CONTINUOUS  SPEECH  Normal  spoken  language.  The  majority 

of  speech  recognition  systems  accept 
either  isolated  or  continuous  speech. 

CONTROL  The  determination  of  the  overall  order 

or  organization  of  problem  solving 
procedures  or  activities. 


OATA-DIRECTED  INFERENCE 


DECLARATIVE  KNOWLEDGE 


DEPENDENCY 


DEPTH-FIRST  SEARCH 


The  type  of  inferences  employed  in 
forward  chaining.  By  applying  infer¬ 
ence  rules  to  supplied  data  or  condi¬ 
tions  a  logical  result  is  derived. 

Knowledge  consisting  of  facts  or 
assertions . 

The  relation  between  logical  conclu¬ 
sions  and  the  premises  and  inference 
procedures  from  which  they  were 
deri  ved . 

A  search  strategy  in  knowledge  based 
systems  in  which  possible  solutions  to 
the  problem  are  represented  by  a  tree 
with  nodes  and  branches.  Iri  depth 
first  search,  a  branch  of  related 
solutions  is  considered  at  all  nodes 
before  moving  to  the  next  branch.  In 
this  way,  the  depth  of  an  assumed  class 
of  solutions  can  be  examined  before 
moving  to  the  next  branch.  See  Breadth 
first  search. 
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GLOSSARY  (CONTINUED) 


DOCUMENT  GENERATION 


DOCUMENT  PREPARATION 


DOCUMENT  UNDERSTANDING 


EXPECTATION-DRIVEN 

REASONING 


EXPERT  FRAMEWORK  SYSTEMS 


EXPERT  SYSTEM 


One  of  many  proposed  applications  of 
natural  language  processing  systems. 
After  information  is  stored  in  a 
computer,  the  computer  generates  a 
document  containing  the  information. 

Another  proposed  application  of  natural 
language  processing  systems.  NLP  systems 
act  as  experienced  editors,  checking  for 
errors  in  spelling  and  grammar  and 
suggesting  ways  to  rephrase  text. 

A  proposed  application  of  NLP  systems 
in  which  a  document  is  read  and  its 
contents  are  assimilated  by  the  system. 

A  control  procedure  that  uses  expecta¬ 
tions  to  formulate  hypotheses  about 
unobserved  situations.  See  backward 
chaining  and  goal-directed  inference. 

The  knowledge  representation  and  reasoning 
mechanisms  of  an  expert  system  without 
the  domain  specific  knowledge  base. 

A  computer  system  that  achieves  high 
levels  of  performance  in  areas  that  for 
human  beings  would  require  years  of 
special  education  and  training. 


129 


THE  BDM  CORPORATION 


EXPERT  SYSTEM  DEVELOPMENT 
ENVIRONMENT 

EXPERTISE 


EXPLANATION  SUBSYSTEM 

FACT 

FORWARD-CHAINING 

FRAME 


GLOSSARY  (CONTINUED) 

The  knowledge  representation  and 
reasoning  mechanisms  of  an  expert 
system  without  the  domain  specific 
knowledge  base. 

The  capabilities  that  enable  high 
performance  in  a  particular  area.  In 
the  context  of  AI  systems,  this 
includes  tools  that  enable  intelligent 
operation  such  as  meta-knowledge, 
heuristic  search  procedures,  and  infer¬ 
ence  rules. 

A  part  of  an  expert  system  that  ration¬ 
alizes  or  explains  its  conclusion  by 
providing  a  summary  of  the  inference 
rules  and  data  that  it  used  to  arrive 
at  the  conclusion. 

A  piece  of  declarative  knowledge. 

A  procedure  for  problem  solving  in 
knowledge  based  systems.  The  procedure 
progresses  from  the  data  given  by  the 
user  to  a  solution  that  adequately 
supports  it.  See  data  driven  inference. 

Data  structures  that  represent  objects 
by  a  list  of  properties  and  relations 
to  other  objects. 
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GLOSSARY  (CONTINUED) 

FUZZY  LOGIC  An  approach  to  inexact  reasoning 

consisting  of  a  proposition  and  a  fuzzy 
set.  The  fuzzy  set  contains  ranges  of 
possible  values  the  proposition  can 
have  and  numerical  values  corresponding 
to  the  probability  of  occurrence  in 
each  range. 

GOAL-DIRECTED  INFERENCE  The  type  of  inferences  employed  in 

backward  chaining.  Emphasis  is  placed 
on  examining  the  states  and  inference 
rules  which  produce  a  desired  goal. 
See  backward  chaining  and  expectation- 
driven  reasoning. 

GRAY  SCALE  Analog  or  digital  numbers  corresponding 

to  shades  of  gray.  The  maximum  and 
minimum  numbers  represent  white  and 
black. 

HEURISTIC  Of  or  relating  to  problem  solving 

procedures  that  utilize  self  educating 
techniques  to  improve  their  performance. 


HEURISTIC  SEARCH 


A  principle  of  AI  from  Nilsson's  Onior: 
model . 
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HUMAN  ENGINEERING 


INFERENCE 


INFERENCE  ENGINE 


INFERENCE  RULES 


GLOSSARY  (CONTINUED) 

An  approach  to  planning  in  which 
computer  time  and  memory  are  saved  by 
considering  only  high  level  details  of 
the  plan.  Vague  and  lower  level 
details  are  then  formulated  into 
subplans . 

The  process  of  engineering  man-machine 
interfaces  to  achieve  maximum  utiliza¬ 
tion  of  that  machine 

The  process  of  passing  from  a  proposi¬ 
tion  whose  truth  is  established  to 
another  whose  truth  is  believed  to 
follow  from  that  of  the  former. 

A  part  of  an  expert  system  containing 
the  procedures  it  will  use  to  solve 
probl ems . 

Methods  or  strategies  employed  by  AI 
systems  for  solving  problems  by  logical 
i nference . 


\  ISOLATED  SPEECH  Speech  that  has  pauses  between  the 

*  words.  The  majority  of  speech  recogni- 

®  tion  systems  accept  either  isolated  or 

))  continuous  speech. 
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GLOSSARY  (CONTINUED) 

KNOWLEDGE  Stored  information  for  use  ir  the 

solution  of  AI  problems.  In  the  case 
of  expert  systems,  knowledge  is  thought 
of  as  facts,  beliefs,  or  heuristic 
rules.  In  the  case  of  speech  and  image 
recognition  systems,  knowledge  is 
thought  of  as;  stored  pattern,  image, 
or  word  data,  beliefs,  and  heuristic 
rul es . 


KNOWLEDGE  ACQUISITION 


KNOWLEDGE  BASE 


KNOWLEDGE  ENGINEERING 


KNOWLEDGE  PROGRAMMING 
TOOLS 


KNOWLEDGE  REPRESENTATION 


The  extraction  and  formulation  of 
knowledge  for  use  in  AI  systems.  The 
AI  computer  equivalent  of  learning. 

The  repository  of  knowledge  in  an  AI 
computer  system. 

A  discipline  associated  with  the 
building  of  expert  systems. 

Software  that  allows  an  expert  who 
knows  little  about  knowledge  engi¬ 
neering  to  program  knowledge  Into  an  AI 
system. 

A  principle  of  AI  from  Nilsson's  Onion 
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LEARNING 


GLOSSARY  (CONTINUED) 

The  process  of  Improving  the  perform¬ 
ance  of  an  AI  system  Dy  using  past 
experience  to  alter  its  stored  know¬ 
ledge  or  problem  solving  strategies. 


LEXICON 


LOGIC  ORIENTED  PROGRAMMING 
LANGUAGES 


LISP 


LISP  MACHINES 

LIST 


A  list  of  morphemes  contained  in  a  NLP 
system's  knowledge  base. 

Programming  languages  whose  purposes 
are  to  program  and  solve  logic 
probl ems . 

The  most  widely  used  AI  programming 
language.  LISP  was  developed  by  John 
McCarthy  at  MIT  in  1958.  LISP  is  an 
object  oriented  programming  language. 

Computing  architectures  dedicated  to 
executing  LISP  programs. 

A  series  of  elements  (either  atoms  or 
other  lists)  enclosed  in  parentheses  in 
a  LISP  program. 


MACHINE  TRANSLATION 

*r. 


1 


A  proposed  application  of  natural 
language  processing  systems  'or  the 
translation  of  a  document  from  one 
language  to  another. 


g  THE  BDM  CORPORATION 


GLOSSARY  (CONTINUED) 


META 


MORPHEMES 


MORPHOLOGICAL  ANALYSIS 


NOISY  DATA 


OBJECT-ORIENTED  PROGRAMMING 
LANGUAGES 


PARSING 


A  prefix  used  with  AI  subjects  to 
denote  the  existence  of  knowledge  about 
the  base  word  or  subject,  as  in  meta¬ 
knowledge,  which  is  knowledge  about  the 
system's  knowledge  base. 

A  basic  linguistic  unit  having  meaning. 

A  technique  used  in  natural  language 
processing.  The  meaning  and  use  of  a 
word  is  determined  by  separating  the 
word  into  parts  and  determining  the 
meaning  of  each  part. 

Data  having  characteristics  that  intro¬ 
duce  uncertainty  Into  the  reasoning 
processes  of  AI  systems. 

Programming  languages  whose  purposes 
are  to  portray  the  characteristics  and 
relationships  between  objects. 

The  act  of  breaking  a  sentence  or 
phrase  Into  its  component  parts  In 
order  to  identify  the  form,  function, 
and  syntactical  relationship  between 
each  part. 
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PRAGMATICS 


PREDICATE  CALCULUS 


PREDICATE  LOGIC 


PROCEDURAL  KNOWLEDGE 


PRODUCTION  RULES 


PROLOG 


GLOSSARY  (CONTINUED) 

Knowledge  about  human  discourse  and 
conversations,  concerning  the  overall 
context  in  which  sentences  or  phrases 
are  written  or  spoken  and  how  the 
various  phrases  are  related  to  each 
other. 

A  formal  language  of  symbol  structures 
for  representing  facts. 

Logical  operations  that,  through  the 
manipulation  of  propositions,  allow  one 
to  make  assertions . 

Knowledge  about  procedures  or  actions, 
typically  specified  as  If...,  then.... 
types  of  production  rules. 

Two-part  statements  that  specify  a 
certain  action  to  be  taken  when  an 
antecedent  condition  is  satisfied, 
usually  in  the  form  of  If...,  then... 
types  of  rules.  Procedural  knowledge 
Is  contained  in  production  rules. 

PROgramming  in  LOGIC,  was  developed  by 
A.  Colmerauer  and  P  Roussel  at  the 
University  of  Marseille  in  1973. 
Prolog  is  a  logic  oriented  programming 
1 anguage . 
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PROPERTY  LIST 


PROPOSITIONAL  LOGIC 


PRUNING 


RULE 


RULE  SET 


SCHEDULING 


SCRIPTS 


GLOSSARY  (CONTINUED) 

A  construct  in  a  LISP  program  that 
associates  a  property  and  a  corres¬ 
ponding  value  with  each  atom.  Property 
lists  describe  the  "state  of  the  world" 
In  an  AI  program  and  therefore  are 
updated  frequently. 

Logic  which,  through  the  use  of  propo¬ 
sitions,  determines  the  truth  or 
falsehood  of  other  propositions. 

The  act  of  eliminating  a  solution  or 
group  of  solutions  from  a  problems 
solution  tree. 

"If... Then"  statements  that  support 
deductive  reasoning. 

A  collection  of  rules  that  constitutes 
a  module  of  heuristic  knowledge. 

Determining  the  order  of  execution  of 
processes  In  AI  programs. 

Data  structures  that  represent  sequences 
of  events.  Scripts  represent  procedural 
knowledge  and  are  similar  in  principle 
to  frames. 
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SEMANTICS 


SEMANTIC  NETWORK 


SLOT 


SOLUTION  SPACE 


SOLUTION  TREE 


SPEAKER  OEPENOENT  SPEECH 
RECOGNITION  SYSTEMS 


GLOSSARY  (CONTINUED) 

The  non-literal  interpretation  of  word 
meanings.  Knowledge  in  this  area  is 
used  extensively  in  speech  and  natural 
language  i nterpretatior . 

A  scheme  for  representing  relationships 
between  objects  in  an  AI  system's 
knowledge  base  in  terms  of  class 
equivalence  and  inheritances. 

A  single  description  of  an  object  in  a 
frame.  Slots  can  contain  information 
such  as  name,  color,  definition,  or 
value. 

A  conceptual  way  of  thinking  of  the 
possible  solutions  to  a  problem,  which 
has  a  direct  bearing  on  the  number  of 
branches  that  an  AI  system  can  search 
In  the  course  of  solving  a  problem. 

The  representation  of  the  possible 
solutions  to  an  AI  problem  by  a  tree 
with  nodes  representing  solution  types, 
and  branches  representing  their 
rel  ati  onshi  ps  . 

Speech  recognition  systems  that  can 
only  accept  input  from  a  particular 
speaker. 
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SPEAKER  INDEPENDENT 
RECOGNITION  SYSTEMS 


SPEECH  RECOGNITION 


SYNTAX 


TEMPLATE  MATCHING 


TOP  DOWN 


TRUTH  MAINTENANCE 


VISION 


WELL  FORMED  FORMULA 


G! OSSARY  (CONTINUED) 

SPEECH  Speech  recognition  systems  that  can 

understand  input  from  any  speaker 
without  being  trained  to  each 
individual  voice. 

The  recognition  of  spoken  language  by  a 
computer  system. 

The  use  of  words  to  form  phrases, 
clauses,  and  sentences. 

One  way  in  which  images  and  spoken 
words  are  identified.  Arrays  repre¬ 
senting  frequency  (sound  for  words, 
light  for  images)  or  Intensity  versus 
time  are  matched  against  those  of  known 
images  and  words  for  similarity. 

An  approach  to  problem  solving  which 
moves  from  some  current  condition  to  an 
initial  condition.  See  BACKWARD 
CHAINING  and  GOAL  DIRECTED  INFERENCE. 

The  act  of  maintaining  truth  or 
consistency  in  the  elements  of  a 
knowledge  base. 

The  understanding  and  i nterpretation  of 
scenes  or  images  by  a  computer  system. 

(WFF)  A  syntactically  valid  statement  in 

predicate  calculus. 
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ABSTRACT 

The  need  to  drastically  increase  the 
riccessiiiq  rate  of  artificial  intelligence 
(Ai)  systems,  coupled  with  the  limitations 
of  cunent  uniprocessor  archi tectures , 
has  resulted  in  a  major  research  impetus 
to  develop  a  new  generation  of  parallel 
s.uerns.  Similar  to  a  number  of  electronic 
Symbolic  computers  being  developed,  optical 
computing  systems  can  be  viewed  as  a 
representative  of  the  class  of  fine-grained, 
1 1 on 1 1 > -  coup  1 ed  architectures.  This  paper 
addresses  potential  roles  of  optical  systems 
in  symbolic  computing  and  suggests  future 
optical  implementations  of  these 
er  :h  i  lectures  . 


i  .TRODUC  T 1L-N 

T»e  ne°o  to  drastically  increase  the  processing 
rate  of  artificial  intelligence  (AI)  systems, 
coupled  with  the  limitations  of  current  uniproces¬ 
sor  architectures,  has  resulted  in  a  major 
research  impetus  to  explore  parallelism  in  modern 
computing  systems.  Applications  of  these  AI 
vs  terns  .up  placing  stringent  demands  upon  the 
"inTr  1  >■  i "u  computer  arch i  tec tures  .  To  meet 
these  challenges,  a  new  class  of  parallel  computer 
architectures  are  emerging,  structures  which 
seek  to  optimize  the  processing  and  retrievel 
of  s/inholir  information. 

One  premising  group  Of  AI-driven  architectures 
car  be  described  as  tightly-coupled,  fine-grained 
m,’  t  i pr  ocessor  systems,  and  are  characterized 
b.  both  a  dependence  on  complex  interprocessor 
cc  -■min  i  c.i  t  ion--  a'wj  flexibility  In  the  interconnect 
Ugvlugy.l  I  li  i  s  means  that  the  architectures 
composed  of  a  large  number  of  similar,  rela¬ 
tively  none  ■.’rip !  ex  processors, ^  which  are  coupled 
ir  such  a  manner  tnat  given  processor  can  communi¬ 
cate  with  any  other.  The  driving  features  behind 
llrs  ne>.  generation  of  computer  architectures 


are  equally  applicable  to  ootical  computing 
SysV-s  .  w  •'■n  are  a  Is.''  c :  a  f.i  c  teri  zed  by  tightly 
coupled,  f  ■..'!■  j:  a  med  elements  with  a  high  denree 
of  communications  flexibility.  “plica!  $ys;  ns 
also  provide  a  form  of  two-dimensional  parallelism 
which  appears  a’e  attractive  fyr  syne-,  lie 
processing  operations.  At  the  present  t :  e , 
we  believe  that  optical  systems  can  be  adapted 
to  the  types  of  operations  nd  data  stiuctures 
enceunted  in  AI,  and  offer  the  promise  of  enhanced 
computational  throughput  to  overcome  bottlenecks 
in  existing  AI  systems. 

This  paper  will  first  describe  some  of  the  aspects 
of  symbolic  computing  which  drive  the  need  for 
these  new  architectures.  As  a  case  in  point, 
opto-electronic  interconnnects  are  already  being 
investigated  for  i  nterprocessor  communications 
in  computer  architectures.  Potential  roles  for 
optics  in  multiprocessor  systems  will  then  be 
addressed.  The  paper  will  conclude  with  a  discus¬ 
sion  of  current  efforts  to  utilize  optics  in 
symbolic  computing,  both  at  the  interconnect 
and  processor  levels,  along  with  proposed  optical 
implementations  which  look  promising  for  general 
purpose  symbolic  processing. 

MULTIPROCESSOR  SYSTEMS  AND  SYMBOLIC  COMPUTING 

Symbolic  processing  is  typically  refers  to  on 
going  research  and  development  in  four  functional 
areas:  soeech  recognition,  vision  or  image  under¬ 

standing,  natural  language  understanding,  and 
expert  systems.  All  four  disciplines  are  charac¬ 
terized  by  the  following  three  attributes:  symbo¬ 
lic  representations  of  knowledge;  a  high  cegiee 
of  interaction  with  a  stored  grouping  of  symbolic 
knowledge;  and  some  level  of  reasoning  capability, 
which  draws  conclusions  by  comparing  inputs  to 
the  system  with  elements  retrieved  from  the  know¬ 
ledge  base.  At  the  present  time,  each  of  these 
factors,  retrieval  of  knowledge,  representation 
of  knowledge  and  reasoning,  limits  the 
computational  throughput  rate  of  AI  systems. 
The  remaincer  of  this  section  will  use  expert 
systems  as  an  example  of  a  symbolic  processing 
acma i n . 

An  expert  system  is  a  machine  witn  mimics  or 
emulates  the  thought  and  reasoning  processes 
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of  human  expert.  It  seeks  to  utilize  the 
solution  techniques  which  a  human  expert  in 
a  given  discipline  would  use  to  solve  particular 
problem  in  that  domain.  Knowledge  appropriate 
to  the  discipline  is  placed  into  the  machine, 
forming  what  is  known  as  the  knowledge  base, 
enabling  the  system  to  understand  the  problem. 
Besides  this  knowledge  base,  the  expert  system 
consists  of  the  "reasoner"  or  "inference  engine'1, 
which  manipulates  the  symbolic  information 
stored  in  the  knowledge  base  so  as  to  "infer" 
new  knowledge  on  its  road  to  deriving  a 
conclusion  to  a  particular  problem. 

Retrieval  in  expert  systems  may  be  considered 
at  four  hierarchical  levels:  search  of 

unorganized  data,  search  of  organized  data, 
content  addressing,  and  heuristic  searching. 
The  amount  of  symbolic  information  stored  in 
a  practical  knowledge  base  is  too  large  to 
even  consider  an  exhaustive  search  of  "norgani  zed 
data.  Considerable  improvement  can  be  realized 
by  organizing  the  data  to  form  data  bas^s, 
and  current  expert  systems  make  extensive  use 
of  dat-  base  management  techniques  to  facilitate 
the  retrieval  process.  But  this  technology 
is  Still  based  on  von  Neumann  architectures 
which  greatly  hinder  both  the  management  and 
access  to  data  bases  of  the  size  that  will 
be  needed  for  expert  systems  of  the  future. 
Inis  problem  has  led  to  considerable  interest 
in  content  addressable  memory  implementations 
of  artificial  neural  systems  and  in 
logic-enhanced,  or  smart,  memories  for 
accomplishing  heuristic  searches. 

Both  the  artificial  neural  systems  and  the 
logicenhanced  memories  fall  into  the  category 
of  tightly-coupled,  fine-grained  architectures. 
That  is,  they  consist  of  thousands  (or  even 
millions)  of  switching  or  processing  nodes 
(finegrained)  which  are  interconnected  to  a 
high  degree  (tightly-coupled).  The  nodes  of 
the  neural  network  are  just  memory  elements, 
but  these  elements  are  interconnected  in  a 
global  fashion,  so  that  processing  associated 
with  data  retrieval  may  be  distributed  over 
al1  of  the  interconnected  nodes.  Although 
such  networks  will  likely  see  use  for  symbolic 
computing,  especially  for  associative  recall 
of  knowledge  base  information,  neither  the 
individual  nodes  nor  the  networks  themselves 
have  the  processing  power  needed  for  advanced 
expert  systems.  Architectures  with  actual 
processing  elements  at  each  node  represent 
the  latest  thinking  of  computer  scientists 
dealing  with  artificial  intelligence.  These 
systems  have  been  labeled  with  such  names  as 
logic-enhanced  memories,  smart  memories,  and 
connection  machines. 

Log i c-enha need  memories  avoid  the  von  Neumann 
bottleneck  by  intermixing  the  processing  and 
the  memory  funtions.  This  can  be  viewed  either 
as  01  st r 1  but i ng  the  memory  among  a  large  number 
of  tigntl y-oouoled  processors,  or  as  providing 
some  processing  capability  to  eacn  element 


of  a  memory  (hence  the  name  logic-enhanced 
memory).  This  permits  such  powerful  functions 
as  interconnect  reconfiguration  and  internodal 
relationship  designation  (e.g.,  for  semantic 
network  representations  -  to  be  discussed). 
These  memories  have  such  a  broad  processing 
power  that  they  may  be  categorized  as 
fine-grained,  tightly-coupled  multiprocessors. 

Both  the  neural  networks  and  the  logic-enhanced 
memories  are  parallel  processors;  that  is,  any 
number  of  nodes  or  any  of  the  interconnects 
can  be  active  at  any  given  time.  It  should 

also  be  noted  that  several  optical  implementations 
of  neural  networks  and  logic-enhanced  memories 
have  been  developed,  including  associative 
processors^*4  and  memories  incorporating  a  feature 
known  as  attention.^  These  systems  all  take 
advantage  of  the  fine-grained  nature  of  the 
optical  devices  and  the  types  of  global 
communications  operations  which  are  possible 
in  optical  computing  systems. 

A  classification  of  parallel  systems  has  been 
developed  by  Seitz, 6  a  modified  version  of  which 
is  shown  in  figure  1.  It  provides  an  interesting 
categorization  based  on  the  number  of  processors 
and  the  relative  degreee  of  processor  ccmplexi ty . 
Conventional  uniprocessor  architectures  are 
plotted  as  a  point  of  reference  representing 
high  complexity  in  a  single  processor.  As  one 
moves  up  to  more  than  one  processor,  the  trend 
is  toward  reduced  complexity  within  each 
processor,  a  trend  that  is  driven  by  total  system 
cost  and  reliability,  on  the  one  hand,  and  by 
an  escalating  overall  system  complexity  on  the 
other  hand. 


4 
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Figure  1 

Classification  of  Parallel  Processors  by  Nodal 
Complexi  ty 

Microcomputer  arrays  are  basically  a  set  of 
computers  that  send  messages  to  one  another 
via  a  communication  network.  Such  systems  are 
usually  loosely-coupled,  that  is,  the  individual 
computers  do  not  share  mam  memory  and  i/0 
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devices,  although  one  computer  can  always  draw 
uoon  another's  resources  through  the  communication 
network.  fne  application  of  such  systems  in 
symbolic  computing  will  likely  be  in  solvino 
problems  that  involve  interactions  between  more 
than  than  one  knowledge  base.  Each  processor 
can  work  on  a  given  part  of  the  problem  in  such 
a  way  as  to  minimize  the  need  for  interprocessor 
commun l ca  t ions. 

Computational  arrays  are  systems  whose  processing 
elements  have  been  designed  for  tasks  of 
comparable  complexity  to  floating-point 
operations.  Systolic  arrays,  for  which  the 
processors  are  connected  in  regular  patterns 
that  match  the  flow  of  data  in  the  computations, 
compromise  most  of  the  architectures  in  this 
category. 

As  mentioned  above,  the  architectures  of  most 
interest  to  knowledge  base  retrieval  are  the 
more  fine-grained,  tightly-coupleu  systems, 
such  as  logic-enhanced  memories  and  neural 
networks  ■ 

Random  access  memories  are  shown  in  Figure  1 
as  a  point  of  reference  on  the  fine-grained 
ends  just  as  uniprocessors  were  shown  as  a 
reference  for  nodal  complexity. 

To  this  point  the  discussion  has  centered  on 
the  knowledge  base  retrieval  process  so 
fundamental  to  expert  system  operation.  The 
second  major  aspect  of  expert  systems,  as 
•mentioned  above, is  knowledge  representation. 
Figure  2  illustrates  one  popular  method  for 
representing  knowledge,  known  as  a  semantic 
network.  A  semantic  net  may  be  characteri zed 
as  a  graphical  representation  scheme  in  which 
the  graph  nodes  represent  objects  or  concepts 
and  the  links  represent  inference  procedures 
that  relate  the  nodes.  The  importance  and 
complexity  of  connectivity  in  these  knowledge 
representation  schemes  has  resulted  in  serious 
consideration  of  the  tightly-coupled 

multiprocessor  architectures  for  symbolic 
computing  and  expert  systems  implementations. 
The  processing  power  at  each  node  can,  for 
example,  be  used  to  define  the  mternodal 
relationships  of  the  semantic  network. 

The  similarity  of  these  highly  connected 
architectures  to  neurological  systems  lenas 
c  i  ecienue  to  their  importance  in  symbolic 
processing.  We  are  very  aware  of  the  power 
cf  the  brain  in  performing  intelligent  operations 
such  as  reasoning  and  pattern  recognition,  yet 
the  brain  consists  of  relatively  slow  switching 
elements.  Tne  biological  switch  is  the  neuron, 
and  it  operates  in  tie  millisecond  range  -  about 
a  mil!  lion  t'res  slower  than  current  electronic 
switching  speeds.  The  difference  lies  in  the 
large  degreee  of  connectivity  between  biological 
switches,  leading  to  a  high  degree  of  parallel 
processing.  Neurons  in  the  brain  can  have  upwards 
o(  10,000  Synapses  ^biological  connectors), 
wnereas  electronic  switches  in  today's  computers 


typically  have  only  a  few  connections  to  other 
switches.  There  is  strong  evidence  to  suggest 
that  the  processing  power  of  the  brain  is  related 
to  the  hign  degree  cf  connectivity  between  the 
neurons,  permitting  parallel  processing  and 
thereby  compensating  for  the  slow  switching 
speeds . 


MMNMrn 


Figure  2 

Semantic  .Networks 

The  next  section  will  focus  cn  potential  roles 
for  optics  in  multiprocessor  systems.  This 
will  lead  to  a  discussion  of  current  efforts 
to  couple  optical  computing  with  AI,  and  will 
conclude  with  two  proposals  for  optical  archi¬ 
tectures  which  could  significantly  enhance  the 
computational  throughtput  of  fine-grained, 
tightlycoupled,  multiprocessor  systems.  Such 
optical lybased  systems,  if  realized,  could  open 
up  a  whole  new  field  of  opto-electronic  computing 
directed  toward  artificial  intelligence  appli¬ 
cations  . 

OPTICS  AND  MULTIPROCESSORS 

Any  approach  to  optical  architectures  must  give 
serious  consideration  to  what  can  be  accomplished 
with  existing  technologies,  namely  VLSI 
electronics. 

Optical  computing  will  not  seriously  threaten 
electronic  computing  unless  it  can  offer  several 
orders  of  magnitude  improvement  in  some  critical 
measurement  criterion,  such  as  the 
power-speed-cost  proauct,  in  a  given  problem 
domain.  Therefore,  a  good  starting  point  in 
addressing  the  application  of  optics  to  symbolic 
computing  is  to  identify  problem  areas  for 
electronics. 

It  is  not  surprising  that  the  relative  weaknesses 
and  strengths  of  electronics  and  optics  are 
traceable  in  one  way  or  anotner  to  tne  fundamental 
physics  cf  inter-electon  and  inter-pnoton  inter¬ 
actions.  Relatively  speaking,  the  interaction 
between  pnotons  is  weak,  hence,  electrons  are 
good  for  the  switching  operations  so  fundamental 
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:o  .  i.  inu-1 1  :io  and  photons  are  good  for  the  inter- 
Switch  ocm-rur. :  ca  1 1  ons  ,  providing  linxs  which 
> re  five  f i- on  detrimental  coupling  effects  that 
le.io  to  crosstalk  and  capacitive  loaning.  Subs¬ 
cribing  to  such  reasoning,  however,  is  impractical 
cue  to  the  quantum  losses  which  accompany  both 
the  e  1  ca  t ron- tu-pnoton  ana  tne  pnoton-to  electron 
conversions.  T "ere  is  research  and  development 
c  erway  to  replace  some  of  the  longer  intercon¬ 
nect  links  within  computers  with  optical  channels 
because  it  is  the  longer  interconnects  that 
create  severe  i>oner,  speed,  and  space  problems 
for  electronics'.  But  such  a  capability  stops 
far  short  of  using  optics  to  its  full  advantage 
■n  irul !  i processor  architectures  appropriate 
f c r  Symbolic  ccmpoti ng. 

Consider'  the  electronic-switc.ning/optica)-comn>un- 
i ca t ions  positions  as  representing  one  of  the 
four  ce  mors  of  the  suuare  s.nown  in  Figure  3. 
'ire  Sides  of  th.e  square  represent  a  continuum 
of  comci na t ' cns  between  the  extremes  of  the 
corners.  The  upper  left  corner  represents 
a  1  1  -e  lec :  ron  i  c  systems  wmle  the  bottom  right 
represents  all-optical.  Since  movement  toward 
tie  pcttom  left  corner  not  technically  practical, 
the  foe jS  is  along  the  upper  anc  right  sides. 
Jeon  considering  computing  sister's  for  which 
Switching  is  the  predominant  function,  the  trade- 
o:f  between  optics  and  electronics  is  seen  to 
tall  sosewhe-'-  along  the  upper  edge,  that  is, 
a'!  elect ron in  switching  wi  some  optica!  links, 
■■•ev.e.er,  symbol  !■:  process,  g  piaces  a  strong 
emphasis  on  connect  el y  as  was  discussed  above. 
The  ce-empi'asis  eti  switching  ( f  ine-grained 
architectures  with  low  noaal  complexity)  and 
the  emphasis  cn  ccmmun ica t i ons  (tightly-coupled 
s> stems }  leads  one  to  consider  architectures 
for  which  the  cornmuncati ons  is  optics  and  only 
seme  or  the  Switching  is  done  with  eiecrromcs. 
(t  is  t.v,s  category  of  electromc/optical  hyoria 
erchi  tectures  that  can  have  a  significant  impact 
on  Symbolic.  Computing. 


Tne  upper  right  hand  corner  of  figure  3  refers 
to  the  set  of  archi tectures  wnich  utilize  optical 
S  w  i  t  c  hing  for  interconnect  reeong i f u ra t i on  and 
eletronic  switching  for  logic  operations.'*  Optics 
proves  to  be  especially  valuable  in  providing 
both  the  longer  and  the  more  global  interconnects 
in  processing,  due  to  the  combined  Dowtr-speec- 
space-crosstal k  penalties  associated  with 
electronic  interconnects.  Several  examples 
of  the  use  of  optical  interconnects  tor  these 
;n  advanced  archi tectures  are  currently  under 
development,  both  in  coarse  and  fine-grained 
computer  structures.  The  interest  in  optics 
arises  from  the  high  bandwidth  communications 
requirements  Oetween  the  individual  nodes  ana 
tne  number  of  parallel  channels  wnich  can 
effectively  be  multiplexed  over  a  single  optical 
link.  The  Texas  Reconf igurable  Array  Computer 
(TRAC),®  for  example, is  studying  fiber  optics 
for  interncdul  board  to  board  communications 
at  bandwidths  between  100  and  500  Kbps,  ana 
the  nARP  system?  is  investigating  the  use  of 
optical  interconnects  for  intercell  common i ca r i ons 
at  a  rate  of  0.5  1.0  Gbps  ana  32:1  multiplexing. 
Another  example,  tnat  of  a  finegrained  electronic 
Symbolic  con.  iter  under  development,  is  the 
Connection  Machine. 2  It  is  composed  of  65,536 
individual  prO''essing  elements  (PEs).  organized 
as  a  large  array  of  printed  circuit  boards, 
each  of  which  uentain  512  PEs  equally  diviuud 
between  32  chips.  Both  guided  ana  unguiaed 
optica!  interconnects  are  being  studied  for 
overcommg  challenges  to  t.ms  archi  tecture , 
such  as  clock  skew,  l.-T  -  3.0  Gbps  communication 
rates  over  32:1  multiplexed  links,  ana  mter- 
processor  broadcast. 

Over  the  longer  term,  architectures  such  as 
that  shown  in  Figure  4  could  be  developed  to 
effectively  integrate  opto-electronic  components 
in  computer  Systems.  Such  a  hybrid  structure 
would  use  optics  for  most  communications  require¬ 
ments  and  electronics  for  processing  element 
functions.  For  the  saxe  of  simplicity,  the 
illustration  snows  only  two  of  tne  many  possible 
boards  and  only  four  chi ps/fcoard .  If  this  were 
a  fine-grained  processor,  each  chip  could  contain 
many  PEs. 


Figure  3 

Electronic  versus  --T-f.cal  Comp  ;  C  ‘  ng 


F  i  g  j  ■  e  4 

H  ,  o  r  •  d  Oc  t '  c  a  i  .  E  1  e  t  t  n  l  r 
Mu i t i processor  Aren . tec t ore 
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Each  board  in  Figure  4  contrains  four 
optoelectronic  chips  and  one  frequency  selective 
filter  (hologram).  In  between  each  board  is 
a  planar  array  of  reconf  i  gurab  1  e  diffraction 
gratings,  which  perfom  the  majority  of  the 
switching  operations  involved  In  the 
interconnection  process.  This  particular 
architecture  employs  wavelength  division 
multiplexing  (WDM)  to  direct  optical  bit  streams 
to  the  appropriate  board.  The  beam  labeled 
illustrates  this  operation.  The  hologram  directly 
above  of  the  transmitting  chip  directs  the  beam 
to  the  center  of  the  next  board,  where  it  is 
superimposed  on  the  main  beam  which  travels 
to  all  of  the  systems  boards.  Upon  reaching 
the  intended  board,  the  frequency  selective 
filter  diffracts  the  beam  to  a  bus-to-board 
hologram  which  directs  the  beam  to  its  final 
destination. 

The  intra-beard  and  intra-chip  interconnects 
would  be  handled  by  the  plane  of  holograms  above 
the  brird,  as  illustrated  by  beam  .  The 
logistics  of  handling  a  large  number  of  muliplexed 
beams  will  not  be  discussed  here,  other  than 
to  say  that  the  optical  swi tchi ng  most  likely 

will  be  achieved  through  nonlinear  wave  mixing. 
For  example,  four  wave  mixing  may  be  used  to 

generate  holograms^  which  can  be  rapibly  varied 
to  permit  interconnect  reconfiguration. 

Diffraction  grating  writing  beams  would  contain 
the  desir  d  information  for  changing  the 
holographic  gratings.  Note  that  some  of  the 
switching  actions  of  such  an  architecture  are 
be‘ng  performed  optically  rather  than 
electronical  ly. 

As  one  moves  toward  the  bottom  right  corner 
of  the  classification  scheme  presented  In  Figure 
3,  the  percentage  of  optical  implementation 
increases  until  an  all-optical  architecture 
is  achieved.  While  a  number  of  efforts  are 

underway  to  develop  such  all-optical  structures, 
the  researcn  is  currently  directed  at  defining 
the  aoprooriate  computational  primitives  for 
optical  symbolic  processing.  Thus  the  present 
focus  if.  constrained  to  identifying  and  protoyping 
systems  which  can  process  the  fundamental  opera¬ 
tions  associated  with  symbolic  comput i ng- -name  1 y , 
the  performance  of  correlation,  searching,  and 
pattern  matching  operations  on  symbolic  data. 

T/pical  of  a  class  of  research  applications 
are  the  opl'ca!  Inference  machines.1*’^  Here, 
the  emphasis  is  on  developing  optoelectronic 
architectures  which  can  perform  the  fundamental 
matching  and  logic  operations  encountered  in 
re  t  r  i  ev  1  ny  symbolic  data  from  a  database.  These 
sets  of  operations  are  both  language  and  repre¬ 
sentation  specific,  so  that  archi  tectures  re 
being  analyzed  tor  several  of  the  major  different 
A!  programming  paradigms.  Examples  of  other 
representations  being  investigated  by  optical 
eon  put i ng  researchers  include  semantic  networks, 
■"•ii-h  theoretical  representations,  symbol’C 
subs' i  tut  ion  fur  binary  dat.a,J1  and  shadowcasting 
S  i.  r  '!■. ' .  i  re  s .  *  1 


Figure  5 

All-Optical  Multiprocessor  Architecture 


An  example  of  a  fine-grained,  tightly-coupled 
optical  symbolic  computer  of  the  future  is  shown 
schematically  in  figure  5T  Although  no  one 
has  built  such  a  computer,  it  is  technically 
believable  to  at  eve  such  a  system  consisting 
of  1  million  parallel  channels.  This  does  not 
mean  that  the  system  would  be  configured 
necessarily  with  1  million  nodes,  since  such 
this  implies  tnat  the  planar  array  of  logic 
elements  (designated  as  the  gate  array)  would 
have  just  one  logic  element  per  channel.  Instead, 
several  logic  elements  would  usually  be 
interconnected  via  the  interconnect  media  to 
form  a  processing  element.  For  example  a  square 
array  of  n  x  n  lugic  elements  (gates)  may  comprise 
an  arithmetic  logic  unit,  several  registers, 
and  possibly  some  cache  memory.  An  example 
of  this  type  nf  structure  is  shown  In  figure 
6,  where  individual  elements  in  a  2-D  SLM  have 
been  assigned  the  necessary  functions  to  comprise 
a  computational  processing  element.  Taking 
an  n  of  5  (25  logic  elements/processor)  would 

lead  to  a  machine  with  40,000  nodes  large  enough 
to  be  practical  as  a  symbolic  computer. 


Figure  6 

An  All-Optical  Processing  Element 
(Detail  of  Figure  5) 
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inout  to  the  optical  computer  could  be  via  either 
an  array  of  independently  addressable  laser 
diodes  or  a  two-dimensional  spatial  light 
modulator  (20  SLM).  The  diode  array  would  be 
capable  of  much  higher  modulation  speeds,  but 
would  involve  more  complex  circuitry,  especially 
if  operation  requires  uniformity  over  the  complete 
array.  If  the  input  already  exists  as  a  two 
d’censional  light  pattern,  such  as  might  be 
output  from  a  vision  processor,  an  input  device 
may  not  be  needed  (depending  on  the  compati bl i  1  i ty 
of  the  two  processors). 

Ihe  logic  element  array  could  be  either  a  20 
SLM  exhibiting  a  norlinear  response  or  an  array 
of  optical  bistable  switches.  The  latter  device 
will  ultimately  lead  to  much  higher  switching 

speeds,  but  current  realizations  of  optical 
bistable  switches  require  impractical  power 
levels.  Improved  nonlinear  optical  materials 
a^e  needed  to  achieve  widely  utilized  optical 

bistable  devices. 

The  interconnect  element  will  likely  employ 

wave  mixing  in  a  nonlinear  optical  medium, 
similar  in  operation  to  that  mentioned  previously 
to  the  nybrid  architecture.  However,  due  to 
the  much  larger  number  of  channels  that  must 

be  handled,  the  switching  may  be  done  in  a 

multistage  fashion,  in  which  multiple  parallel 
planes  of  real-time  hologram  arrays  would  be 

exerc  i  sed. 

The  detector  will  be  a  major  technological  chal¬ 
lenge.  In  the  most  general  case,  one  would 

like  a  one  million  channel  device,  with  each 

channel  operating  around  1  MHz  (projected  speed 

for  20  SLMs).  However,  the  requirements  will 
be  much  less  for  most  practical  processor  designs, 
if  the  problem  domain  were  to  require,  say, 

100  iterations  or  more  (e.g.,  semantic  network 
searches  to  depths  uf  at  least  100),  an  output 
would  be  required  only  once  every  100 

mi -.roseu.’nds .  This  reduces  the  throughput  rate 
of  the  detector  to  10^,  a  number  more  in  line 
with  projections  for  GaAs  microelectronics, 
another  example  would  be  where  each  processor 
consists  of  a  block  of  n  x  n  channels  as  discussed 
above.  Assuming  an  n  equal  to  4  and  that  each 
processor  has  just  one  output  channel,  the 
throughput  requirement  of  the  detector  would 
be  6.25  x  10J*.  Some  combination  of  these  two 
designs  should  yield  a  detector  requirement 
that  would  be  well  within  near  technical 
Teas  i  bi  1 1  ty . 

The  last  major  component  of  this  all-optical 
architecture  is  the  memory.  The  practice  in 
electronics  of  co- locating  some  of  the  memory 
with  the  logic  elements  cannot  necessarily  be 
transferred  to  the  optical  computing  domain 
because  of  the  greatly  reduced  communications 
cel  ays.  Thus,  Figure  5  shows  the  main  memory 
as  the  single  Mock,  equally  shared  by  all  of 
tre  processors. 


CONCLUSION 

The  ultimate  objective  of  artificial  intelligence 
is  to  exoand  the  power  and  reasoning  processes 
in  computing  machines,  allowing  these  machines 
to  emulate  and  achieve  capabilities  typically 
associated  with  intelligent  behavior  in  humans. 
However,  efforts  to  achieve  these  goals  have 
been  severely  con  trained  by  today's  serial 
architectures  and  by  the  separation  of  the  proces¬ 
sing  and  memory  functions.  Fine-grained,  tightly- 
coupled  multiprocessors  appear  to  be  a  viable 
class  of  architectures  for  symbolic  processing. 
Optical  techniques.  In  the  form  of  opto-electronic 
interconnects  and  optical  computing,  are  being 
investigated  as  a  means  of  alleviating 
computational  bottlenecks  in  these  systems. 

Opto-electronics  may  play  a  major  role  in  making 
these  systems  a  reality,  either  as  a  supplement 
to  an  existing  architecture  or  as  part  of  an 
Integrated  opto-electronic  computer.  Research 
to  date  has  demonstrated  the  viability  of  optical 
interconnects,  and  a  number  of  efforts  are  under¬ 
way  to  Incorporate  optc-electronics  Into  existing 
architectures.  At  another  level,  the  potential 
exists  for  developing  all-optical  symbolic 
processors  as  part  of  a  larger  scale  computational 
environment.  This  appears  particularly  promising, 
since  the  2D  SLMs  are  in  fact  fine-grained  PEs 
in  a  ti ghtly-coupleo  environment.  Such  an 
all-optical  system  could  serve  as  a  co-processor 
In  symbolic  processing  systems,  and  research 
is  underway  to  Identify  the  appropriate 
computational  primitives  and  compatible 
representations. 

Interestingly  enough,  at  the  present  time,  the 
numeric  "supercomputers"  execute  A1  functions 
at  a  greater  rate  than  deoicated  A1  machines. ^ 
This  may  imply  that  raw  speed  is  an  important 
component  in  overcoming  existing  AI  computational 
bottlenecks.  It  also  may  indicate  that  architec¬ 
tures  designed  to  improve  numeric  thro^gho'i1'. 
may  also  be  useful  in  symbolic  computation, 
and  vice  versa.  Regardless  of  which  of  these 
paths  are  developed,  the  utilization  of  optics 
In  symbolic  processing  represents  an  exciting 
new  direction  for  optical  computing. 
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EXECUTIVE  SUMMARY 

Image  understanding  Is  the  process  by  which  a  computer  interprets  a 
scene.  Significant  progress  has  been  made  over  the  last  few  years  on 
building  computer  vision  systems.  Systems  have  been  developed  that 
operate  on  complex  real  imagery  in  such  tasks  as  medical  diagnosis, 
inspection  of  industrial  parts,  and  interpretation  of  remotely  sensed 
data.  Most  of  these  systems  are  designed  to  be  special  purpose,  with 
excessive  domain-specific  constraints.  These  domain  specific  designs  make 
development  of  new  systems  very  time-consuming  and  expensive.  As  a 
result,  there  is  a  clear  need  to  develop  image  understanding  systems  that 
can  deal  with  complex  imagery  that  is  also  robust  enough  to  be  extended  to 
other  domains. 

The  development  of  these  general  purpose  vision  systems  has  proved 
difficult  and  complex.  Research  in  the  last  few  years,  however,  has 
opened  the  door  to  an  array  of  computational  theories  of  vision  that  will 
prove  helpful  in  outlining  processes  that  have  application  over  a  wide 
range  of  domains.  Many  of  these  theories  have  been  prompted  by  studies  on 
biological  vision  systems.  This  report  will  review  a  series  of  the 
computional  processes  that  appear  promising  for  designing  a  robust  image 
understanding  system. 

The  computational  processes  used  in  image  understanding  are  of  two 
general  forms:  numeric  and  symbolic.  The  numeric  processing  associated 
with  computer  vision  is  called  low  level  processing.  Low  level  vision  may 
be  characterized  as  a  data  representation  of  a  scene  in  terms  of  sensor 
data  without  reference  to  knowledge  of  the  content  of  the  scene.  A  series 
of  preprocessing  operations  are  conmonly  performed  at  the  low  level  prior 
to  interpretation  such  as  image  formation  and  transformation,  image 
restoration  and  enhancement,  and  image  registration.  Low  level  vision  is 
also  the  level  at  which  primitive  yet  rich  descriptions  of  a  scene  are 
extracted.  Some  of  these  primitives  are  edges,  lines,  and  regions.  Image 
segmentation  is  also  a  part  of  low  level  processing. 
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Symbolic  processing  is  known  as  high  level  processing.  Understanding 
a  scene  entails  far  more  than  number  crunching,  rather,  it  involves  repre¬ 
senting  image  features  in  a  symbolic  form,  similar  to  the  way  the  human 
visual  system  processes  information.  The  purpose  of  high  level  processing 
is  to  operate  on  intermediate  representations  derived  from  earlier  vision 
processes  in  support  of  such  functions  as  feature  aggregation  (perceptual 
grouping),  object  detection  and  recognition,  scene  interpretation,  and 
activity  and  event  de  -ription.  These  intermediate  representations  are  in 
the  form  of  symbolic  representations.  Processing  over  these  symbolic 
forms  is  more  concerned  with  global  relationships,  graph  matching,  and 
inference  programing  rather  than  some  of  the  low  level  concerns  including 
neighborhood  processing,  template  matching,  and  numerical  accuracy.  The 
final  objective  of  high  level  vision  is  to  form  a  coherent  form  of  the 
whole  scene  which  combines  the  object  information  resident  at  the  lower 
processing  levels  with  previously  stored  information  reflecting  knowledge 
about  the  physical  environment.  This  may  involve  contextually  piecing 
together  image  features  using  syntactic  (semantic)  techniques,  or  through 
more  heuristic  methods. 

The  gap  between  these  two  levels  is  the  intermediate  level  of  proces¬ 
sing.  Intermediate  level  processing  creates  the  symbolic  forms  that  can 
be  used  for  processing  in  the  high  level.  Examples  of  symbolic  forms 
involve  describing  an  image  in  terms  of  regions  and  line  segments  as  well 
as  their  associated  attributes.  The  goal  of  intermediate  level  processing 
is  to  determine  which  features  are  relevant  and  the  appropriate  labeling 
of  these  features. 

Implementation  of  image  understanding  algorithms  is  a  key  considera¬ 
tion  for  computer  vision.  Many  of  the  computer  vision  systems  require 
real-time  or  near  real-time  processing  capabilities.  Many  of  the  tradi¬ 
tional  sequential  von-Neumann  archi tectures  lack  the  processing  capabili¬ 
ties  necessary  for  many  of  the  image  understanding  processes.  Tne  solu¬ 
tion  has  been  to  implement  tne  image  understanding  algorithms  in  a 
parallel  environment.  Many  different  parallel  architectures  have  been 
prototyped  to  widen  the  computational  bottleneck  that  results  from  the 
image  understanding  process. 
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Optical  processing  provides  an  alternative  for  solving  the  computa¬ 
tional  requirements  in  image  understanding.  The  two  principal  advantages 
of  optical  computing  are  parallel  processing,  which  maps  nicely  onto  many 
standard  image  processing  tasks,  and  its  real-time  operation.  Optical 
computing  offers  a  potential  solution  for  processing  over  both  the  numeric 
and  symbolic  forms  associated  with  image  understanding.  The  general 
purpose  optical  computer  is  still  some  time  away,  requiring  further  devel¬ 
opment  of  logical  gates,  etc.  There  are,  however,  optical  architectures 
that  offer  a  reasonable  near  term  solution  to  the  computationally  numeric 
processing  requirements.  These  procedures  are  very  competitive  with 
conventional  digital  electronics  for  reasonably  well-defined  problems. 
Also,  many  of  the  operations  performed  in  high  level  symbolic  processing 
require  high  processing  throughput  at  a  relatively  low  degree  of  accuracy. 
Optics  appears  very  amenable  to  this  characteristic. 

Whether  optical  computing  can  solve  all  problems  in  computer  vision 
is  still  undetermined,  however,  many  possibilities  still  exist.  For 
example,  alternatives  exist  which  combine  both  digital  and  optical  tech¬ 
nologies.  A  hybrid  di gital /optical  system  can  be  realized,  where  front- 
end  processing  is  performed  by  an  optical  system  and  intermediate  to  high 
level  processing  is  performed  by  an  intelligent  back-end  processing 
architecture.  We  should  keep  in  mind  that  interfacing  optical  communica¬ 
tion  channels  to  electronic  devices  presents  challenging  technical 
problems.  This  report  will  analyze  how  well  an  optical  parallel  network 
can  be  used  to  solve  the  problem  of  image  understanding  in  a  computer 
vision  system. 

An  introduction  to  image  understanding  is  given  in  Chapter  I.  The 
three  levels  of  processing  which  are  involved  in  image  understanding  will 
be  explained.  We  will  also  examine  some  of  the  inherent  problems  associ¬ 
ated  with  algorithm  formulation. 

Chapter  II  examines  the  procedures  used  in  early  level  processing. 
Some  of  the  classic  algorithms  will  be  analyzed  along  with  the  rationale 
for  using  them.  Low  level  processing  basicaTiy  involves  preprocessing  and 
creation  of  a  primal  sketch.  Preprocessing  operations  compensate  for 
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degradations  resulting  from  sensor  characteristics.  The  primal  sketch  is 
derived  from  a  local  neighborhood  of  image  primitives.  In  many  cases 
early  level  processing  is  a  key  step  in  the  entire  scene  interpretation 
process  since  it  lays  the  foundation  by  which  further  processing 
progresses. 

Chapter  III  describes  the  role  of  segmentation  in  image  understanding 
and  examines  the  algorithms  which  have  had  success  in  many  applications. 
Segmentation  is  the  process  of  pixel  classification,  where  the  image  is 
segmented  into  subsets  by  assigning  the  individual  pixels  to  classes  in 
order  to  generate  a  description  of  the  scene.  Texture  and  optical  flow 
will  also  be  examined  as  aiding  in  the  segmentation  process.  Texture  is 
the  spatial  distribution  of  pixels.  Many  techniques  have  been  developed 
to  describe  this  arrangement,  and  we  will  examine  one  that  has  had  much 
success.  Knowledge-based  segmentation  will  also  be  explained  along  with 
some  of  the  artificial  intelligence  (AI)  techniques  used  in  creating  a 
knowledge-driven  process.  AI  techniques  are  used  for  guiding  and  control- 
ing  the  segmentation  process. 

Chapter  IV  looks  at  intermediate  level  processing  for  image  under¬ 
standing.  We  will  step  through  some  of  the  classification  techniques  that 
have  been  developed  for  pattern  recognition,  both  statistical  and  syntac¬ 
tic.  We  will  also  describe  the  importance  that  intrinsic  characteristics 
of  an  image  has  on  image  understanding.  These  intrinsic  qualities  are  an 
important  source  of  information  for  analyzing  three  dimensional  scenes. 
Finally  a  powerful  computational  tool,  called  relaxation,  will  be 
examined.  Relaxation  is  attractive  since  it  is  amenable  to  parallel 
implementation. 

Chapter  V  explains  the  ooerations  used  in  high  level  symbolic  proces¬ 
sing.  Processing  over  symbolic  representations  of  image  elements  provides 
the  easiest  way  for  a  computer  to  understand  a  complex  scene.  Symbols 
supply  information  such  as  form,  structure,  and  groupings  whereas  proces¬ 
sing  over  symbols  provide  information  such  as  relations,  occurrences,  and 
matching.  Symbolic  processing  over  image  elements  allows  the  vision 
system  to  make  inferences  about  a  scene,  similar  to  the  way  a  human  would. 
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Many  of  the  techniques  used  to  process  over  these  symbolic  forms  are  based 
on  artificial  intelligence  (AI)  techniques.  We  will  examine  these  AI 
techniques  and  provide  a  functional  architecture  dedicated  to  high  level 
symbolic  processing.  Finally,  we  will  briefly  describe  LISP,  which  is  a 
popular  symbolic  data  processing  language. 

In  Chapter  VI  an  example  is  given  which  demonstrates  the  techniques 
used  in  image  understanding.  In  particular,  the  scene  is  an  IR  image  of  a 
tank.  It  starts  at  the  low  level  stages  and  proceeds  through  the  higher 
level  processes.  In  this  process  the  original  image  is  transformed  into  a 
number  of  forms,  each  of  which  are  more  symbolic  than  the  previous  one. 
The  techniques  used  to  arrive  at  each  form  is  explained  ,  as  well  as  the 
rationale  for  creating  them. 

Finally,  Chapter  VII  summar izes  the  content  of  the  report  and 
provides  a  table  that  lists  many  of  the  important  developments  that  have 
been  reviewed.  In  addition,  directions  for  further  research  are  outlined. 

Throughout  the  entire  text,  references  are  made  to  parallel  implemen¬ 
tations  of  the  image  understanding  algorithms,  both  through  digital  and 
optical  architectures. 
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The  processing  requirements  needed  to  solve  the  machine  vision 
problem  are  not  well  understood.  Presently  no  one  can  provide  a  detailed 
algorithm  specification  for  a  general  vision  interpretation  system.  How¬ 
ever,  it  is  possible  to  list  the  processing  features  which  are  necessary 
to  significantly  advance  machine  vision  technology. 

By  machine  vision,  or  image  understanding  by  computer,  we  mean  much 
more  than  image  processing,  which  usual N  refers  to  the  enhancement  and/or 
restoration  of  images.  Rather,  the  goal  of  machine  vision  is  the  auto¬ 
matic  transformation  of  an  image  to  a  symbolic  form  that  represents  a 
description  and  understanding  of  the  content  of  the  image. 

The  image  understanding  process  can  be  thought  of  as  an  iconic  to 
symbolic  (or  signal  to  symbol)  transformation.  After  performing  such 
common  image  processing  operations  as  edge  detection  and  segmentation, 
which  gives  the  "primal  sketch"  as  proposed  by  Marr.l  and  grouping  the 
primitives  which  result  from  these  operations  in  terms  of  regions, 
contours  and  boundaries  we  are  left  with  iconic  features  of  the  raw  image 
data.  To  perform  image  interpretation  the  vision  system  must  transform 
these  iconic  forms  into  symbolic  form.  The  transformation  is  from  a  low 
level  (e.g.,  pixel  at  location  (20,100)  has  a  red  intensity  value  of  28) 
to  a  symbolic  representat ion  of  the  object  in  a  scene,  in  terms  of  some 
predefined  knowledge  about  objects  in  the  world  (that  is,  region  28  in  the 
image  corresponds  to  a  particular  object  class  TANK-GUN). 

The  machine  vision  problem  can  be  described  by  three  levels  of 
processing  -  low,  intermediate  and  high.  The  low  level  consists  mainly  of 
operations  on  pixels  and  local  neighborhoods  of  pixels.  This  may  involve 
segmentation  algorithms  to  partition  pixels  into  regions  of  similar  color 
or  texture  properties,  and/or  extracting  lines  through  intensity  and  color 
discontinuities  a'  local  edges.  The  result  of  this  low  level  processing 
is  a  the  transfo'  lation  of  a  raw  image  to  an  image  with  regions  and  line 
segments . 
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The  intermediate  level  of  representation  is  an  interface  between  the 
low  level  processing  at  the  pixel  level  and  the  symbolic  elements  repre¬ 
senting  visual  knowledge  stored  in  a  database.  The  intermediate  level 
consists  of  a  symbolic  description  of  the  two  dimensional  image  in  terms 
of  regions  and  line  segments  as  well  as  their  associated  attributes. 
Their  are  two  general  tasks  for  intermediate  level  processing.  The  first 
task  entails  the  extraction  of  features  for  regions,  lines,  and  vertices 
as  well  as  relations  between  these  elements.  For  example,  regions  supply 
information  about  intensity,  texture,  location,  compactness,  major  axis 

orientation,  etc.  Line  segments  provide  information  on  location, 
orientation,  length,  width,  contrast,  etc.  Finally  vertices  convey 
location,  connection  of  line  segments,  curvature,  etc.  The  second  task 

involves  grouping,  splitting,  and  laoelling  processes.  These  methods  are 
used  to  form  intermediate  features  which  more  naturally  match  stored 
object  descriptions .  Some  operations  are:  (1)  labeling  points  of  high 

curvature  on  the  perimeter  of  a  region;  (2)  merging  co-linear  line 

segments;  and  (3)  merging  adjacent  line  segments. 

The  high  level  processing  controls  the  intermediate  level  of  proces¬ 
sing  where  the  symbolic  two-dimensional  representations  of  the  intermedi¬ 
ate  level  must  be  related  to  object  descriptions  stored  in  some  knowledge 
base.  That  is,  the  objective  of  high  level  processing  in  image  under¬ 
standing  is  to  operate  on  the  intermediate  representations  (e.g.,  2  i/2  D 
sketch,  intrinsic  images)  derived  from  early  vision  processes  in  support 
of  such  functions  as  feature  aggregation  (perceptual  grouping),  object 
detection  and  recognition,  scene  interpretat ion ,  and  activity  and  event 
description.  In  addition,  high  level  processing  can  assist  in  focusing 
attention  and  resolving  conflicts  or  uncertainity  at  low  levels  of  the 
vision  hierarchy. 

In  studying  high  level  processing,  two  major  classes  of  requirements 
predominate.  Tr.e  first  class  concerns  knowledge  representation  whose 
requirements  include  the  need  to  represent  analogical,  propositional  and 
procedural  knowledge,  as  well  as  iconic,  geometric  and  relational  struc¬ 
tures.  In  addition  it  is  desirable  that  the  representat ions  be  easily 
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extensible,  and  facilitate  rapid  access  and  processing.  The  second  class 
of  requirements  deals  with  processing,  and  includes  the  need  to  support 
data-driven  and  goal-driven  processing,  hierarchial  and  heteroarchial 
control,  parallel  and  serial  processing,  and  inquires  into  knowledge 
representations. 

Part  of  this  process  entails  control  strategies  for  matching,  merging 
and  finally  infering  the  presence  of  related  objects.  The  result  of  this 
high  level  of  processing  is  a  symbolic  representation  of  the  content  of  a 
specific  image  in  terms  of  a  general  stored  knowledge  of  the  object 
classes  and  the  physical  environment. 

It  is  important  to  realize  that  the  flow  of  information  between  all 
:nree  levels  of  processing  is  in  both  directions.  That  is,  the  image 
understanding  process  can  be  data-directed  (bottom-up),  knowledge-directed 
(top-down),  or  both.  In  the  upward  direction,  from  the  low  level  to 
higher  levels,  the  communication  consists  of  results  from  segmentation 
procedures  that  include  a  set  of  attributes  of  an  extracted  image  event  to 
be  stored  in  a  symbolic  representation .  The  information  allows  processes 
at  the  higher  levels  to  evaluate  the  success  of  the  lower  level  opera¬ 
tions.  It  is  also  a  mechanism  for  passing  actual  symbols.  In  the  down¬ 
ward  direction,  from  the  highest  level  to  the  lower  level,  the  coumunica- 
tion  consists  of  corrmands  for  selecting  subsets  of  images,  specifying 
further  processing,  and  requests  for  additional  information. 

A.  IMAGE  UNDERSTANDING  PROCESSES  AND  ALGORITHMS 


Image  understanding  is  the  process  by  which  a  computer  is  "pro¬ 
grammed  to  understand  a  scene.  Recent  developments  in  the  area  o*  imace 
understanding  have  been  fostered  by  the  artificial  intelligence  community 
involved  in  machine  vision.  However,  on  a  more  rudimentary  level,  image 
understanding  has  developed  througn  an  extension  of  knowledge-based 
pattern  recognition  techniques  applied  to  computer  vision  systems.  Both 
approaches  are  based  or.  human  visual  processes  and  Gestalt  psychology, 
where  experiments  have  shewn  that  representing  an  image  in  a  symbolic  form 
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provides  a  more  effective  way  to  understand  a  scene,  especially  as  a  scene 
gets  more  complex.  In  this  symbolic  approach,  a  pattern/image  is  repre¬ 
sented  as  a  string,  a  tree,  and/or  a  directional  graph  of  pattern  primi¬ 
tives  and  their  relations.  The  decision  making  and/or  structural  analysis 
then  becomes,  in  general,  a  parsing  procedure. 

There  are  some  inherent  differences  between  image  understanding  and 
pattern  recognition  that  should  be  expounded.  First,  pattern  recognition 
systems  are  concerned  typically  with  recognizing  the  input  from  a  small 
set  of  possibilities.  Image  understanding  aims  to  construct  rich  descrip¬ 
tions  of  an  image  that  can  not  be  "prograrmed"  in  advance.  Second, 
pattern  recogniton  systems  are  mostly  concerned  with  two  dimensional 
images,  whereas  image  understanding  systems  operate  mostly  on  three  dimen¬ 
sional  images.  Finally,  pattern  recognition  systems  typically  operate 
directly  on  the  image.  Image  understanding  systems  operate  on  symbolic 
representations  of  the  image  that  have  been  computed  by  earlier  processes. 
It  should  be  kept  in  mind,  however,  that  these  are  not  strict  definitions. 

One  of  the  main  problems  today  in  machine  vision  has  been  the  inabil¬ 
ity  of  researchers  to  clearly  formulate  those  mathematical  algorithms  used 
for  image  understanding.  Also,  these  algorithms  have  been  so  narrowly 
defined  that  their  application  to  a  generalized  computer  vision  system  is 
limited.  Some  of  the  promising  prototype  vision  systems  today  which  claim 
to  be  performing  high  level  processing,  from  the  2  1/2  D  to  3  0  sketch, 
are  nodied  with  heuristic  and  ad  hoc  procedures.  That  is,  the  processing 
that  is  used  for  the  machine  visison  problem  utilize  schemes  that  follow 
some  pre-def ip.ea  rjles  that  are  only  appropriate  for  a  specific  set  of 
problems.  The  result  is  a  set  of  algorithms  that  piecemeal  techniques 
rat.ner  than  a  knowledge  case  system  which  evaluates  scene  primitives  in  a 
robust  fashion.  Consequent ly ,  whether  the  terms  image  understanding, 
symbolic  computing,  or  pattern  recognition  are  used  to  describe  Drocessess 
in  an  intelligent  machine  vision  system,  it  is  essential  that  one  has  a 
clear  understanding  of  tne  algorithms.  CncQ  these  algontnms  are  under¬ 
stood  it  will  be  then  easier  to  assess  tne i r  utility  for  designing  a 
reliable  computer  visison  system. 
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The  first  process  for  image  understanding  is  creating  the  2D  primal 
sketch.  Many  of  the  so  called  "low-level"  pattern  recognition  and  image 
processing  techniques  are  used  in  this  development.  Next  is  the  symbolic 
representation  process  which  entails  the  grouping  of  features  into 
symbols.  Surface  extraction  can  also  take  place  in  this  process. 
Finally,  there  is  the  semantic  or  syntactic  process  which  allows  one  to 
translate  the  symbols  of  the  image  into  a  description  of  a  scene. 

6 .  PARALLEL  IMPLEMENTATIONS  -  OPTICA!  PROCESSING 

An  important  consideration  in  developing  'image  understanding  algo¬ 
rithms  ir  how  well  they  can  be  implemented.  Many  of  the  operations 
performed  by  understanding  algorithms  are  computationally  exhaustive  and 
hence  prohibit  real-time  implementation.  The  solution  has  been  to  imple¬ 
ment  these  algorithms  in  parallel.  Most  of  the  approaches  in  the  past 
have  been  for  sequential  implementation.  This  has  resulted  for  the 
following  reasons.  First,  the  software  has  been  designed  for  sequential 
applications.  Second,  it  was  not  technically  practical  or  even  feasible 
to  implement  these  algorithms  with  a  parallel  architecture.  However,  with 
the  advent  of  VLSI  and  VHSIC  design,  parallel  implementation  of  irrr  ,e 
understanding  algorithms  for  machine  vision  h.s  been  possible. 

The  field  of  optical  processing  has  grown  to  the  point  where  real¬ 
time  systems  are  being  developed.  The  two  principal  advantages  of  optical 
computing  are  parallel  processing,  which  maps  nicely  onto  many  standard 
image  processing  tasks,  and  its  real  time  operation.  This  review  will 
analyze  now  well  a  parallel  networx,  such  as  optical  processing,  can  be 
used  to  solve  the  problem  of  -mage  understanding  in  a  computer  vision 
system.  This  correspondence  will  not  attempt  to  prove  that  optical 
processing  can  solve  the  computational  procedures  but  rather  will  refer  to 
researeners  who  nave  reported  success  in  using  optical  techniques  m  tnese 
areas . 

The  following  paragraphs  will  look  at  the  many  ways  in  which  a 
machine  vision  system  can  process  and  understand  a  scene  in  an  image. 
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Then  we  will  investigate  whether  this  processing  can  be  mapped  onto  a 
parallel  processing  architecture  and/or  an  optical  processing  system.  The 
approach  will  be  to  start  at  the  "low-level"  processes,  refer  to  Figure 
44  ,  and  progress  to  the  "higher"  levels. 
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LOW  LEVEL 


—  ARRAYS  OF 
INTENSITIES 

—  OATA  DRIVEN 

—  PRIMAL  SKETCH 


INTERMEDIATE 

LEVEL 


SYMBOLIC  DESCRIPTION 
OF  REGIONS.  LINES, 
SURFACES 

FEATURE  CLASSIFICATION 
KNOWLEDGE  BASE 
INTRINSIC  IMAGE 


HIGH  LEVEL 


-  SYMBOLIC  DESCRIPTIONS 
OF  OBJECTS 

-  CONTROL  STRATEGIES 

-  GOAL  ORIENTED 

-  PLANNING 

-  INFERENCE 


Figure  44.  Communication,  Control,  nd  Representation  in  the  Image 
Understand! ng  Paradigm 
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The  goal  in  image  understanding  is  to  create  a  representation  of  a 
scene  that  can  be  understood  by  a  computer.  However,  as  was  mentioned 
before,  this  takes  place  over  many  levels  of  processing.  The  initial 
processing  involves  operating  on  the  raw  image  data,  which  consists  of  a 
matrix  of  pixels  that  represent  the  gray  1  e ve  1  intensity  of  a  scene.  How¬ 
ever,  in  most  cases,  the  scene  that  is  captured  by  the  imaging  system  is 
not  an  ideal  representation.  Many  ^.andard  image  processing  functions 
have  been  developed  to  compensate  for  this  discrepancy. 

One  of  the  first  operations  performed  in  an  image  processing  environ¬ 
ment  is  to  restore  and/or  enhance  the  image  which  has  been  degraded  or 
deformed  by  the  imaging  system  or  the  processing  techniques  used  to 
recover  that  image.  Some  of  the  techniques  used  for  restoration  and/or 
enhancement  are:  (1)  spatial  filtering,  used  to  accentuate  certain  char¬ 
acteristics  of  t*e  image;  (2)  non-linear  (median)  filtering,  used  to 
suppress  noise  in  the  image;  (3)  transform  processes,  that  isolate  certain 
features  of  the  image;  (4)  inverse  filtering;  (5)  least  squares  (Wiener) 
filtering;  (6)  recursive  filtering;  (7)  maximum  a  posterori  (MAP);  and  (8) 
maximum  entropy. 

One  of  the  computational  methods  used  to  perform  the  inverse  filter¬ 
ing  for  image  restoration  is  singular  value  decomposition  (SVD).  S V D  is 
used  to  model  tne  blurring  impulse  function  (point  spread  function).  if 
we  represent  the  point  spread  function  (PSF)  P  as  a  MxN  matrix  of  rank  R, 
then  it  is  possible  to  decompose  matrix  P  into  a  product  of  two  unitary 
matrices  and  a  diagonal  matrix, 

P  --  UA  V  in 


>  . 
Km 


J 


V1 

II 


where  U  is  a  MxM  unitary  matrix,  N  is  a  NxN  unitary  matrix  and  **  is  an 
My N  matrix  with  a  general  diagonal  entry  ^(j),  called  a  singular  value  of 
P.  It  is  possible  to  express  Equation  (I)  in  the  series  form, 
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where  the  outer  products,  u j v j ^ ,  of  the  eigenvectors  form  a  set  of  unit 
rank  matrices  each  of  which  is  scaled  by  a  corresponding  singular  value  of 
P. 

A'  mentioned  before,  image  restoration  attempts  to  restore  an  image 
that  ha.  been  degraded.  Using  a  general  vector-space  model  to  represent 
the  tru*  -.formation  process  from  the  input  image,  f,  to  the  output  image, 
g,  as 

i  =•  ?  £  (3) 

it  is  theoretically  possible  to  recapture  the  ideal  image,  f,  by  finding 
the  inverse  of  P.  If  P  is  properly  characterized,  then  it  is  possible  to 
use  the  concept  of  SVD  to  obtain  the  ideal  image,  f,  by  designing  a  filter 
that  implements  the  P  inverse  operation,  which  can  be  expressed  as, 
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The  ideal  image  can  then  be  expressed  as, 
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SVD  has  been  very  successful  image  restoration  technique;  however,  it 
doer,  suffer  from  being  computationally  intensive.  The  eigenvectors  and 
must  first  be  determined  for  the  matrix,  PPT,  and  its  transpose.  Then  the 
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vector  computations  must  be  performed.  Even  pseudoinverse  computational 
algorithms  that  have  been  adapted  for  SVD  are  slow  and  are  prone  to  severe 
ill-conditioning  errors. 

Parallel  languages  and  arcnitectures  are  being  researched  and  devel¬ 
oped  to  handle  the  preprocessing  operations  used  in  computer  vision.  The 
characteristics  of  the  algorithms  can  determine  which  architecture  is  the 
most  suitable.  Two  architectures  that  are  appropriate  are  single  instruc¬ 
tion  -  multiple  data  (SIMD)  and  multiple  instruction  -  multiple  data 
(MIMD). 

The  general  configuration  of  a  SIMD  architecture  is  depicted  in 
Figure  45  .  It  consists  of  a  single  control  unit,  N  processors,  N  memory 
modules,  and  an  interconnection  network  (CN).  Each  processor  is  connected 
to  its  own  memory  module  to  form  a  processing  element  (PE).  All  proces¬ 
sors  execute  the  same  instructions,  hence,  there  is  a  single  instruction 
stream.  Each  processor  executes  instructions  on  the  set  of  data  stored  in 
its  own  memory  module,  hence  there  is  a  multiple  data  stream.  The  inter¬ 
connection  network  facilitates  communi cation  between  the  processing 
elements. 

The  general  configuration  of  a  MIMD  architecture  is  depicted  in 
Figure  46  .  It  consists  of  N  processors,  a  shared  memory,  and  an 
interconnection  network.  Each  processor  executes  its  own  set  of 
instructions,  hence  there  is  a  multiple  instruction  stream.  Each 
processor  also  executes  its  instructions  on  the  data  in  the  shared  memory, 
hence  there  is  a  multiple  data  stream.  The  processors  access  the  shaded 
memory  through  the  interconection  network. 

SIMD  archi tectures  are  generally  more  suitable  for  low  level  image 
processing  where  identical  operations  are  performed  on  a  large  number  of 
pixels.  The  image  is  partitioned  into  subimages  and  each  processor  is 
assigned  a  single  subimage.  Then,  by  executing  the  same  set  of  instruc¬ 
tions,  all  processors,  m  parallel,  process  all  subimages  of  the  image. 

MIMD  arcn i tectures  are  generally  more  suitable  for  high  level  image 
analysis.  The  techniques  involved  in  the  higher  levels  of  image  under¬ 
standing,  such  as  the  2  1/2  0  sketch,  are  pattern  extraction  and  image 
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Figure  45. 


General  SIMD  Configuration 
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Figure  46.  General  MIMD  Configuration 
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classification.  Images  at  this  stage  are  represented  by  data  structures 
other  than  simpl'  two  dimensional  arrays.  The  algorithms  include  many 
independent  operations  on  common  data  that  are  well  structured  for  MIMD 
architectures. 

As  mentioned  earlier,  many  techniques  used  for  image  restoration 
and/or  enhancement  implement  some  form  of  discrete  two-dimensional  trans¬ 
form  or  solve  a  linear  system  of  equations.  Some  examples  of  parallel 
implementations  of  these  pre-processing  operations  follow.  For  example,  a 
pipelined/parallel  architecture  has  been  developed  for  the  two  dimensional 
Fourier  Transform  (FFT).2  Researchers^  have  implemented  the  least-mean- 
square  (IMS)  algorithm  for  image/signal  restoration.  Finally,  others4  have 
used  a  single  instruction  -  multiple  data  (SIMD)  architecture  for  image 
reconstruction. 

Since  SVD  is  based  on  outer  product  operations,  it  has  been  proposed 
that  optical  processors  perform  this  matrix  operation.  For  years 
researchers  in  optical  processing^  have  promoted  using  optical  processes 
for  a  variety  of  linear  algebra  operations.  The  use  of  an  optical  archi¬ 
tecture  for  a  SVD  outer  product  calculation  is  very  attractive  for  image 
restoration. 

The  next  level  of  processing  involves  constructing  a  primitive  but 
rich  description  of  an  image  that  can  be  used  for  further  processing.  The 
image  processing  and  pattern  recognition  community  have  developed  a 
variety  of  ways  to  recover  these  primitives.  These  primitives  have  been 
based  on  intensity  changes  found  within  an  image,  that  is,  an  edge. 

Many  linear  and  nonlinear  edge-enhancement  and  detection  operators 
have  been  developed.  However,  studies  have  shown^  that  the  nonlinear 
operators  provide  improved  signal -to-no  i  se  i'  SNR  3  than  such  linear  ooera- 
tors  as  high-pass  filtering.  A  particularly  attractive  nonlinear  edge- 
enhancement  operator  is  the  Sobol  operator. 

The  Sobel  operator,  li<e  other  non-linear  operators  (Robert,  Kvsch, 
Prewitt,  etc.),  utilize  a  non-linear  combination  of  pixels  as  a  means  of 
edge  enhancement.  Most  operate  by  processing  over  a  2x2  or  3x3  pixel 
window.  The  Sooel  method  operates  by  sliding  a  3x3"  window  over  an  entire 
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image  space.  If  we  consider  the  pixel  f(x,y)  as  the  reference  pixel,  then 
by  referring  to  Figure  47  ,  the  edge  value  at  e(x,v)  is. 
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where, 


X  »  ( A2+2A3+A4)  -  (AO+2A7+A6)  (7a) 

Y  =  ( AQ+2A1+A2)  -  (A6+2A5+A4)  (7b) 


The  angle,  or  the  direction  of  the  edge,  at  e(x,y)  can  also  be  calculated 

by, 

<t>  -  (VO  (8) 


Thus  Equation  (6)  and  (7)  describe  the  final  image  pixel  value,  e(x,y),  as 
D  the  nonlinear  combination  of  its  surrounding  neighbors.  It  is  also  possi- 

ble  to  describe  the  Sobel  operator  by  the  convolution  of  the  3x3  window 
with  two  linear  local  mask  operators,  X  and  Y,  described  now  as, 
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Inspection  of  these  n-ask  operators  in  Equation  (9)  correspond  to  spatial 
differentiation  (or  since  we  are  aealing  with  digital  images,  spatial 
differencing)  .  ltn  different  weighting  for  pixels  further  from  the  mask 
center . 

Since  tne  Sobel  operator  is  a  local  operation,  as  are  the  other  edge 
detection  operators,  then  parallel  implementation  is  possible.  Many 
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igure  4/.  Numbering  Convention  for  the  Sobel  Operator 
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parallel  systems  are  being  built  to  perform  these  local  neighborhood  oper¬ 
ations.  The  general  scheme  is  to  design  an  architecture  where  individual 
processors  are  used  to  operate  on  individual  neighborhoods  using  a  S I MO 
framework . 

The  Geometric  Arithmentic  Parallel  Processor  (GAPP)  system7  is 
currently  being  used  to  perform  local  operations,  such  as  the  Sobel 
filter.  The  GAPP  chip  is  an  array  of  72  parallel  bit-serial  processor 
elements  that  function  in  a  systolic  mode.  The  operations  performed  by 
the  GAPP  include  convolving,  sorting,  and  other  arithmetic  procedures. 

In  addition,  optical  systems  can  perform  the  linear  spatial  differen¬ 
tiation  operation  by  implementing  an  optical  matched  spatial  filter  corre¬ 
lator.  Briefly,  the  optical  system's  output  is  the  convolution  of  the 
input  image  f(x,y)  and  the  reference  function  h  (i.e.,  the  impulse 
response  of  the  system).  When  h  is  two  delta  functions  separated  by  d, 
the  system's  output  is 

output  =  f  *  h  =  f ( x ,y)  *  (x+d,y)  -  i>(x,y)"^  =  (10) 

or  the  first  difference  linear  two-point  approximation  to  the  1-0  spatial 
differention  of  the  input  image  f(x,y). 

By  rewriting  the  Sobel  output  function  from  Equation  (6)  as, 

g(m,n)  =  x2  .  y2  =  (X  +  jY)2  £  |  x  +  j v\  (11) 

we  are  able  to  implement  this  complex  arithmetic  function  optically  so 
that  the  output  amplitude  at  each  point  is, 

U  +  j Y \  =  ( ( A2+2A3+A4) -( AO+2A7+A6) )  ,,2j 

+  j  ■;  ( AO+2A1+A2 )  -( A6+2A5+A4 ) ) 

We  can  describe  Equation  (12)  as  in  Equation  (10)  by  the  convolution 
of  tne  input  function  f(x,y)  with  a  sum  of  delta  functions  at  eight 
spatial  locations  with  complex-valued  weights  for  each  delta  function, 


THE  BDM  CORPORATION 


g(x,y)  =  f (x ,y )  *  [  (l+j)£  (x-d,  y-d)+2$(x-d,  y)  + 

(l-j)<5(x-d,  y+d)+2j£(x,  y-d)  -  ,  ^ 

2j$(x,  y '-d )  - ( 1  - j  )i(x+d,  y-d)  -  (13) 

2i(x+d,  y)-(l+j)<5(x+d,  y+d)"j 

where  d  is  the  spacing  between  pixels  in  the  input  image  and  where  the 
weights  and  locations  of  each  delta  function  are  obtained  from  Equation 
(12). 

Researchers^  have  demonstrated  that  it  is  possible  to  realize  both 
the  desired  impulse  reponses  and  complex-valued  weighting  in  Equation  (13) 
optically.  Specifically,  the  impulse  response  could  be  achieved  by  form¬ 
ing  the  holographic  matched  spatial  filter  (MSF)  of  an  input  function 
containing  delta  functions(apertures)  at  the  correct  locations  and  of  the 
correct  radii  (to  adjust  the  intensity)  and  with  the  necessary  phase 
factors  achieved  by  placing  A/4  and  A/2  plates  behind  the  appropriate 
apertures.  The  complexed-valued  weights  for  each  delta  function  could  be 
achieved  by  shifting  the  wedge  in  1-0  in  its  plane. 

Another  local  edge  operator  that  has  been  of  interest,  especially  in 
the  AI  community,  is  the  Laplacian  of  a  Gaussian  (D2G)  edge  finding  tech¬ 
nique.  This  operator  has  been  used  to  find  edges  to  create,  what  Marr  has 
termed,  the  "primal  sketch".  Marr  and  Hildreth1  reasoning  for  using  this 
type  of  operator  is  to  smooth  the  edge  before  taking  the  derivative,  since 
derivatives  are  inately  noisy.  The  smoothing  is  dene  by  convolving  the 
image  with  a  Gaussian.  Intensity  changes,  that  characterizes  edges,  are 
found  by  locating  the  zero  crossings  produced  by  imp  It men ting  tne  lapla- 
cian,  which  is  a  non-directional  linear  second  derivative  operator.  The 
D2G  operator  can  be  implemented  by:  (  1 }  scaling  the  operator  values  by 
some  constant;  (2)  using  nearest  integer  values  for  each  scaled  operator 
value;  (3)  extending  the  support  of  the  filter  (window)  to  include  all 
nonzero  integer  values;  and  / 4 ,  -nampu  1  ating  operator  values  by  a  small 
amount  to  ensure  that  the  values  integrate  to  zero. 

It  is  interesting  to  note  *ny  a  Gaussian  was  chosen  as  the  optimal 
smoothing  filter  and  the  Laplacian  was  chosen  for  finding  the  edges.  Two 
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physical  considerations  have  combined  to  decide  on  using  a  Gaussian.  The 
first  is  the  need  to  reduce  the  range  of  scales  over  which  intensity 
changes  take  place.  This  condition  can  be  expressed  by  requring  that  the 
spatial  frequency  spread, Aco,  be  small.  The  second  consideration  is  that 
the  visual  world  is  not  constructed  of  primitives  that  extend  over  large 
areas,  but  rather  of  localized  occurrences  of  pattern  primitives,  such  as 
contour;,  shadows,  creases,  etc.  This  condition  can  be  expressed  by 
requiring  that  the  spatial  domain,  Ax,  be  small. 

Unfortunately,  these  two  requirements  are  conflicting,  since, 


A  * 


_J _ 

A  c-j 
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However,  there  is  a  relation,  called  the  uncertainty  principle,  which 
states  that  ixaw  *  1/4t.  The  distribution  that  optimizes  this  relation 
is  the  Gaussian.  Thus,  Marr  and  his  predecessors  have  chosen  the  Gaussian 
as  the  optimal  filter  for  image  smoothing. 

The  reason  for  using  the  laplacian  is  two-fold.  First,  an  edge  can 
be  character ized  as  peak  in  the  first  directional  derivative,  or  equiva¬ 
lently,  a  zero-crossing  in  the  second  directional  derivative  of  intensity. 
The  Laplacian  is  a  second  order  derivative.  Second,  one  needs  an  operator 
that  is  independent  of  the  orientation.  The  Laplacian  is  the  only  orien¬ 
tation-independent  second-order  differential  operator. 

The  d i erence-of-Gaussi an  (DOG)  operator  is  an  approximation  to  the 
Laplacian  of  a  Gaussian  convolved  with  the  image.  The  DOG  operator  is 
created  by  one  addition  of  one  positive  and  one  negative  Gaussian- 
weighting  funct'on,  with  the  variance  of  the  two  Gaussians  having  a  ratio 
of  about  1.6.  Tr.e  result  :■*  subtracting  the  two  Gauss' ans  to  create  the 
DOG  operator  is  shewn,  in  one  dimension,  in  Figure  48  .  The  output  of 
this  difference  resembles  a  mexican  hat. 


An  application  or'  tne  GOG  operator  for  an  intermeci ate  vision  under¬ 
standing  scenario  is  stero  matching. 9  The  algorithm  consists  of:  (1) 

calculating  the  edges  using  the  00G  operator  at  a  coarse  scale  ( i . e . , 
where  the  window  size  or  mask  is  large);  \2)  matching  the  edges  in  the 
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Figure  48.  Creation  of  the  DOG  Edge  Operator,  "Mexican  Hat" 
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cnnocular  views;  (3)  calculate  the  edges  at  „  .ess  coarse  scale;  (4)  match 
the  edges,  within  the  broad  limits  of  the  previous  pass;  and  (5)  iterate 
for  finer  resolution  images,  until  complete. 

Researchers^  are  presently  in  the  process  of  constructing  hardware 
to  implement  the  DOG  operator  in  a  parallel  fashion.  More  recently,  an 
efficient  optical  imaging  device  that  performs  linear  convolutions  with 
two-dimensional  circularly  symmetric  operators  in  parallel  has.  been 
designed  using  VLSI  technology.^ 

Very  often  the  edges  that  are  produced  from  implementing  some  local 
operator  or  edge  fitting  procedure  also  need  some  processing.  For  many 
man-made  objects,  edges  correspond  uh*  juxtaposition  surfaces  or 
shadows.  In  a  wall  focused  image,  edges  should  appear  sharp  and  should 
extend  in  some  direction  for  some  length.  Obviously,  in  the  natural 
world,  the  boundaries  are  not  necessarily  as  sharply  defined,  e.g.,  trees, 
fields,  etc.  However,  the  output  of  the  operators  described  previously  is 
generally  smeared  at  or  near  the  .  edge  location.  For  many  image  under¬ 
standing  problems  it  is  necessary  to  localize  the  edge  so  that  it  lies 
along  the  object  boundaries.  Given  a  knowledge  of  the  edge  detector,  it 
is  possible  to  design  a  process  which  accepts  the  output  of  the  operator 
and  produces  a  "thinned"  representation  of  the  edge  at  the  location  of 
maximum  edge  response  to  produce  an  edge  map. 

One  of  the  more  common  techniques  for  edge  thinning  uses  a  process  of 
non-maximum  suppression.  Non-maximum  suppression  is  a  process  where  edge 
magnitude  values  which  are  normal  to  the  direction  of  the  edge  are 
suppressed.  Obviously,  it  would  be  incorrect  to  consider  simply  those 
edge  values  of  maximum  response,  since  this  would  force  adjacent  points  in 
the  direction  of  the  edge  to  compete.  In  practice,  directional  masks 
(north,  south,  east,  west),  among  other  similar  techniques,  are  applied 
directly  to  the  edge  values  to  create  the  edge  map.  The  algorithm 
proceeds  by  associating  a  directional  mask  with  each  edge  point  oriented 
norma  1  to  the  max imum  edge  response  at  that  point .  The  center  point  is 
then  deleted  (assigned  zero  response}  if  any  po;nt  within  the  mask  has  a 
greater  response.  The  edge  masks  are  shown  in  Figure  49  .  One  of  the 
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Figure  49.  Edge  Thinning  Masks  for  Each  Principal  Edge  Dir?ction 
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main  considerations  for  edge  thinning  is  maintaining  digital  connected¬ 
ness,  whether  it  be  4-neighbor  or  8-neighbor. 

Since  these  edge  masks  perform  local  operations,  they  can  be  imple¬ 
mented  in  parallel.  This  thinning  algorithm  is  best  processed  by  an  SIMD 
architecture.  In  this  architecture  the  image  would  be  divided  among  the 
processing  elements  on  an  equal  size  basis.  Then  each  processing  element 
would  convolve  the  operator  masks  with  the  portion  of  the  image  it  holds. 
The  interconnection  network  would  be  used  to  transfer  boundary  pixels 
neeoed  by  neighboring  processing  elements. 

Finally  this  edge  thinning  algorithm  has  the  potential  of  being 
implemented  in  an  optical  processing  environment,  similar  to  the  edge 
operators. 
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CHAPTER  III 

SEGMENTATION  AND  THE  FULL  PRIMAL  SKETCH 
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The  next  step  after  we  have  produced  the  primitive  characteristics  of 
the  scene  is  to  find  parts  of  the  scene  that  belong  to  the  same  class, 
that  is,  we  need  to  segment  the  image.  Image  segmentation  is  the  process 
by  which  an  image  of  a  scene  is  broken  into  separate  parts  (regions  or 
objects)  ir,  order  to  generate  a  description  of  that  scene.  Generally, 
segmentation  is  a  process  of  pixel  classification,  that  is,  the  picture  is 
segmented  into  subsets  by  assigning  the  individual  pixels  to  classes. 

Many  different  techniques  have  been  developed  for  image  segmentation,  most 
being  general  purpose  while  others  are  dependent  on  the  particular 
application. 

Image  segmentation  falls  into  three  general  categories:  (1) 
characteristic  feature  thresholding  ana  clustering;  (2)  edge  detection; 
and  (3)  region  extraction.  Examples  of  some  of  the  segmentation 
techniques  used  include  thresholding  and  spectral  (color)  signature 
classification, 12  region  growing, 13  pyramid  approaches  (linking  and  spot 
detection),!4  double  window  filters, 15  border/edge  followers,^ 
relaxation,!7  superslice, 15  superspike,  19  etc.  A  more  detailed 
explanation  of  these  techniques  can  be  found  in  an  accompanying  report. 

Although  these  standard  segmentation  techniques  are  fairly  widely 
applicable,  it  is  not  always  obvious,  given  a  class  of  images,  which  of 
them  are  applicable  to  tnat  class.  In  fact,  the  success  of  a  particular 
technique  is  dependent  on  how  well  an  image  satisfies  a  particular  set  of 
assumptions,  wnich  are  not  always  explicitly  stated.  Therefore  it  is 
important  that  the  assumptions  (models;  that  underlie  various  basic  image 
segmentation  techniques  are  fully  understood  in  order  to  successfully 
utilize  them. 

The  standard  method  0'<  segmenting  an  image  is  by  thresholding.  Here 
the  classes  correspond  to  grey  level  ranges,  e.g.  "light  =  hot"  and  "dark 
=  cool".  Since  these  ranges  are  not  known  in  advance,  they  must  be 
determined  by  examining  the  grey  level  histogram  and  looking  for  peaks 
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(one-dimensional  clusters)  and  choosing  thresholds  (one-dimensional 
decision  surfaces)  that  separate  the  peaks. 

It  is  also  possible  to  segment  an  image  by  thresholding  the  values  of 
the  local  properties  (other  than  the  g'ey  level)  measured  at  each  point. 
For  example,  suppose  that  a  picture  is  composed  of  "busy"  regions  and 
"smooth"  regions.  It  should  be  possible,  in  principle,  to  segment  the 
image  into  these  regions  by  computing  some  local  measure  of  "busyness"  at 
each  point,  and  thresholding  the  values  of  this  measure.  Even  better 
results  can  be  obtained  if  we  smooth  the  "busyness"  values  by  locally 
averaging  them  over  a  neighborhood  of  each  point  since  this  tends  to 
reduce  variability  of  the  values  and  hence  make  the  histogram  peaks  easier 
to  separate.  The  "busyness"  values  can  be  found  by  applying  a  difference 
operator  over  a  neighborhood  of  each  point. 

Multidimensional  thresholding  has  been  used  by  Ohlander^  f0r  a 
segmentation  scheme  for  natural  color  images.  Ohlander's  method  is 
considered  a  recursive  region  splitting  method.  The  algorithm  can  be 
described  by  referring  to  Figure  50  .  The  first  step  is  to  select  a 
region  of  an  image.  The  second  step  is  to  compute  the  histograms  for  all 
features  for  the  portion  of  the  image  which  is  contained  in  the  image. 

The  histograms  were  representati ve  of  the  red,  green,  blue  tristimulus 
values,  the  television  transmission  tristimulus  values  (Y,I,Q)  and  a  set 
of  non-standard  color  coordinates  called  intensity,  hue,  and  saturation. 
The  third  step  is  to  select  the  "best"  peak  in  the  set  of  histograms. 
Fourth,  threshold  the  image  to  a  binary  < orm  using  the  upper  and  lower 
thresholds  derived  from  the  upper  and  lower  bounds  for  the  best  peak  in 
the  set  of  histograms.  Fifth,  select  the  connected  regions.  Sixth,  save 
these  regions  and  check  each  region  for  f jrther  segmentations.  Finally, 
continue  the  segmentation  on  the  remainder  of  the  region  which  was  being 
segmented  -  terminate  when  there  are  too  few  points  left. 

Price^O  extended  tms  method  for  segmenting  monochromatic  images. 

His  procedure  implemented  two  general  modifications:  (1)  planning,  used  to 
improve  the  speed  of  the  procedure,  and  (?)  texture  operators.  The 
planning  procedure^1  performs  the  segmentation  on  a  reduced  version  of  the 
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Figure  50.  Segmentation  by  Recursive  Region  Splitting 
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image  and  uses  this  segmentation  as  a  plan  for  the  final  segmentation  of 
the  full  size  image.  The  textural  operators  are  added  to  generate  more 
features  for  segmentation.  They  are  based  on  edge  operators. 

Initial  analysis  has  indicated  that  thresholding  by  cluster  detection 
in  histograms  is  not  adequate  for  separating  targets  from  background. 

Grey  level  target  and  background  clusters  are  often  not  separable,  i.e., 
their  probability  densities  overlap.  The  basic  disadvantage  of 
segmentation  schemes  which  use  only  local  feature  values  is  that  they 
attempt  to  classify  image  parts  without  regard  to  their  relative  positions 
in  the  image.  The  segmentation  techniques  that  are  being  implemented 
today,  which  are  mentioned  in  an  accompanying  report,  rely  on  processes 
that  make  use  not  only  of  similarity  but  also  proximity. 

Segmentation  by  thresholding  is  a  common  low  level  image 
understanding  technique.  There  has  been  research  into  implementing  this 
procedure  in  a  parallel  environment  in  order  to  increase  the  speed  of 
computations.  Two  popular  architectures,  the  WARP22  and  ZM0B.23  have 
implemented  a  histogramming  techrique  with  thresholding  successfully. 
Another  successful  architecture,  PASM,24  uses  a  parti tionable  SIMD/MIMD 
system  for  building  histograms.  The  GAPP  system  has  also  been  used  to 
perform  histogramming.  Finally,  researchers^  have  proposed  ways  to 
threshold  in  an  optical  processing  environment  using  a  microchannel 
spatial  light  modulator. 

Segmentation  based  on  an  edge  detection  is  very  corrmon.  There  are 
many  algorithms  that  exist  which  capitalize  on  the  information  supplied  by 
edge  information.  An  algorithm  that  links  edges  based  on  characteristics 
of  the  imagery  has  been  used  successfully  in  a  number  of  applications.^ 

An  extension  of  this  technique  has  been  used  in  an  object 
recognition/classification  mode  where  symbolic  descriptions  are  created 
from  knowledge  about  the  scene.  The  algorithm  is  used  to  extract  linear 
features  in  an  image  by  a  process  of  edge  detection  and  derive  higher 
level  descriptions  from  the  extracted  lines. 

The  algorithm  consists  of  the  following  steps.  First,  determine  edge 
magnitude  and  direction  by  convolution  of  the  image  with  a  number  of  edge 
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masks.  Second,  thin  and  threshold  the  edge  magnitudes.  Third,  link  the 
edge  elements  based  on  proximity  and  orientation.  Fourth,  approximate 
linked  elements  by  piecewise  linear  segments.  The  first  two  steps  in  the 
algorithm  are  similar  to  the  techniques  of  edge  detection  and  thinning 
that  have  already  been  discussed.  The  "linking"  procedure  is  the  main 
crux  of  the  algorithm.  The  linking  algorithm  is  based  on  a  set  of  rules 
which  have  been  derived  both  heuristically  and  through  a  general  knowledge 
about  edges.  Using  an  eight-neighbor  model  for  connectedness,  neighbors 
of  edge  elements  are  determined  by  predecessors  and/or  successors.  Rules 
are  set  up  to  decide  whether  a  predecessor/succesor  is  found.  The  result 
is  edge  point  configurations  giving  the  "best"  open  boundary.  Then  an 
intermediate  step  is  initiated  to  form  boundary  segments  by  operating  on 
the  predecessor/successor  edge  image.  Finally  an  iterative  end-point  fit 
algorithm  is  implemented  to  form  line  segments  for  the  most  distinct 
boundaries . 

A  top-down  knowledge  about  the  scene  can  be  used  to  create  a  symbolic 
description  for  certain  classes  of  objects.  The  symbolic  description  is 
in  the  form  of  a  logical  rule  base.  For  example,  applying  the  edge 
linking  algorithm  to  imagery  that  is  known  to  contain  roads  and  runways 
would  define  a  rule  base  that  searches  for  elongated  parallel  lines,  where 
the  lines  have  opposite  contrasts.  An  additional  search  tool  that  would 
help  in  describing  these  elongated  parallel  lines  is  the  width  of  the  pair 
of  lines  and  medial  line  information. 

The  edge  linking  algorithm  performs  predominantly  local  operations 
and  thus  could  oe  implemented  in  a  parallel  architecture.  An  appropriate 
arcnitecture  is  SIMD.  Since  many  of  the  rules  used  in  the  linking 
alqorithm  can  be  programmed  as  logical  comparisons  then  tne  linking  can  be 
performed  in  local  neighborhoods  by  processing  elements.  It  may  be  more 
appropriate  to  have  the  image  divided  among  the  processing  elements  on  an 
equal  "interest'1  basis  rather  than  having  the  PE's  dividec  on  an  equal 
size  basis  to  maintain  continuity  in  boundary  formulation. 

The  third  type  of  segmentation  procedure  is  region  growing.  Region 
growing  uses  image  char acter  1  st i cs  to  map  individual  pixels  in  an  input 
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image  1.0  sets  of  pixels  called  regions.  The  most  primitive  region  growers 
use  only  aggregates  of  properties  of  local  groups  of  pixels  to  determine 
regions.  More  sophisticated  techniques  "grow1'  regions  by  merging  more 
primitive  regions. 

Region  growing  techniques  can  be  split  into  three  different  areas: 

(1)  local,  (2)  global,  and  (3)  splitting  and  merging.  Local  techniques 
involve  placing  pixels  in  a  region  based  on  their  individual  properties  or 
properties  of  their  close  neighbors.  Global  techniques  group  regions  on 
the  basis  of  the  properties  of  large  numbers  of  pixel  distributed 
throughout  the  image.  The  algorithm  used  by  Ohlander  is  an  example  of  a 
global  procedure.  Finally  splitting  and  merging  techniques  operate  on 
individual  pixels  and  use  state  space  procedures  to  merge  or  split  regions 
using  graph  structures  to  represent  regions  or  boundaries. 

The  splitting  and  merging  technique  proposed  by  Brice  and  Fennema,^ 
and  later  extended  by  Feldman  and  Yakimovsky27  and  others, 28, 29  has  been 
an  important  aevelopment  for  segmentation  since  domain-dependent 
"semantics"  were  incorporated  into  the  analysis.  That  is,  knowledge  about 
the  scene  was  used  to  improve  the  performance  of  the  segmentation 
procedure.  Also,  their  algorithm  promoted  attempts  to  understand  complex 
scenes  using  an  artificial  intelligence  methodology. 

In  order  to  perform  splitting  and  merging  8rice  and  Fennema  developed 
a  boundary  representat ion .  This  representation,  refer  to  Figure  51  , 

describes  the  image  in  the  form  of  two  grids,  the  supergrid,  S,  and  the 
image  grid,  G.  The  .  and  +  represent  the  supergrid  and  0  represents  the 
sub-grid.  The  representat ion  is  assumed  to  be  four-neighbor.  Boundaries 
of  regions  are  then  defined  at  points  marked  +. 

The  region  growing  algorithm  can  be  described  m  two  parts.  The  first 
is  the  ncn-semantic  algorithm.  Proceed  by  merging  regions  i,j  as  long  as 
they  have  one  weak  separating  edge.  A  weak  separating  edge  is  defined 
heurist 1 c a  1 !y  with  a  threshold  criteria.  Then  merge  regions  i,j  when 
S(  i,j)4  T1  where, 
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until  no  two  regions  pass  this  test. 

The  second  part  is  the  semantic  algorithm.  Let  Bij  be  the  boundary 
between  Ri  and  R j .  Evaluate  each  Bij  with  a  Bayesian  decision  function 
that  measures  the  conditional  probability  that  Bij  separates  the  two 
regions  Ri  and  Rj  of  the  same  "interpretation".  Merge  regions  Ri  and  Rj 
if  the  conditional  probability  is  less  than  some  threshold.  Then  evaluate 
the  " interpretation"  of  each  region,  Ri,  with  a  Bayesian  decision 
function.  Assign  the  interpretation  to  the  region  with  the  highest 
confidence  of  correct  interpretation.  Update  the  conditional 
probabilities  for  different  interpretations  of  neighbors. 

The  semantic  part  of  this  region  growing  algorithm  maximizes  an 
evaluation  function  that  measures  the  probability  of  a  correct 
interpretation,  given  the  measurements  on  the  boundaries  and  regions  of 
the  partition.  A  criteria  function  was  derived  to  determine  the 
probability  that  a  boundary  Bij  between  regions  Ri  and  Rj  is  false. 
Finally,  a  confidence  measure  was  derived  as  a  ratio  of  conditional 
probabilities  to  determine  the  most  likely  interpretations. 

Implementation  of  region  growing  in  a  parallel  environment  is 
feasible  since  many  of  the  operations  are  performed  on  a  local 
neighbornood  of  pixels,  difficulty  would  be  m  "programming"  one 

semantic  criteria,  as  proposed  in  the  previous  algorithm,  in  parallel. 

A.  TEXTURE  and  QP’ICAL  -..Ow 

Two  characteristics  of  images  that  has  become  increasingly  useful  for 
segmentation  are  texture  and  motion  (optical  flow}.  Texture  has  been  used 
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by  many  researchers  for  image  understanding  in  scene  analysis. 
Unfortunately,  there  has  been  as  many  defintioris  of  what  texture  is  as 
their  has  been  researchers  investigating  it.  Image  texture  can  be 
qualitatively  evaluated  as  having  one  or  more  of  the  properties  of 
fineness,  coarseness,  smoothness,  randomness,  etc.  Texture  can  be 
described  by  the  number  and  types  of  texture  primitives  (texels)  and  the 
spatial  organization  of  its  primitives.  The  spatial  organization  may  be 
random,  may  have  dependence  on  one  primitive,  or  may  have  a  dependence  of 
n  primitives  at  a  time.  The  dependence  may  be  statistical  or  structural. 

The  structural  models  represent  the  texels  as  a  repeating  pattern  and 
construct  rules  for  generating  them.  The  model  is  best  suited  for 
describing  patterns  which  have  texels  that  are  highly  regular.  An  example 
of  a  structural  approach  to  texture  analysis  will  be  given  in  the 
syntactic  pattern  recognition  section  in  Chapter  IV. 

The  statistical  models  describes  texture  by  statistical  rules  that 
govern  the  distribution  and  relation  of  grey  levels.  This  technique  is 
appropriate  for  images  of  natural  scenes  whose  texels  are  hard  to 
differentiate.  Many  of  the  techniques  used  in  the  statistical  approach 
will  also  be  referenced  in  Chapter  IV.  The  general  procedure  is  to  create 
a  set  of  features  based  on  texels  in  an  image  for  use  in  a  classification 
scheme . 

One  of  the  statistical  techniques  that  has  had  much  success  for 
texture  analysis  is  the  spatial  grey  level  dependent  (SOLD)  co-occurrence 
matrix.  The  following  will  provide  a  general  description  of  the  co¬ 
occurrence  matrix.  Let  ( F }  represent  the  discrete,  quantized  (Mx.N)  image 


matrix.  Let  each  element  F(i,j)  be  a  3-bit  integer  giving  a  range  of 
(0,23  .  i\  .  { u , v }  is  denotec  as  trie  relative  frequency  with  which  twc 

pixels,  one  with  grey  level,  u,  and  one  with  grey  level,  v,  separted  by  a 
distance, -<  ,  in  the  direction, ,  occur  in  the  image  matrix  ^Fj.  For 
example,  is  the  directional  joint  grey  level  occurrence  of 

adjacent  pixels  'in  a  3x3  neighborhood  that  are  separated  by  distance  one. 


Common  practice  is  to  restrict-^  to  multiples  of  T/4.  The  directional 
matrices  [  "]  describe  tne  second  order  statistics  of  an  image  for  a 
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given  spatial  direction,^,  and  separation, *  .  The  co-occurence  matrix 
can  be  used  in  histogram  sharpening  for  segmentation.  A  set  of  features 
can  then  be  calculated  from  the  co-occurrence  matrix,  such  as  entropy, 
contrast,  maximum  probability,  etc.,  for  training  in  the  classification 
procedure. 

Parallel  implementation  of  the  co-occurrence  matrix  is  possible 
because  of  the  local  nature  of  the  operations  being  performed.  A  similar 
architecture  using  SIMD  can  be  used  to  implement  this  algorithm,  similar 
to  process  for  the  edge  operators.  Some  differences  would  be  in  the 
manner  in  which  the  PE's  are  programmed. 

The  use  of  optical  flow  for  segmentation  has  become  increasingly 
popular  over  the  last  few  years.  Optical  flow  is  the  distribution  of 
apparent  velocities  of • irradiance  patterns  in  a  dynamic  image.  The 
velocity  patterns  and  its  discontinuities  can  be  an  important  source  of 
information  about  the  arrangement  and  the  motions  of  visible  surfaces. 
When  two  adjacent  surfaces  undergo  different  rigid  motions,  a 
discontinuity  in  the  the  velocity  field  usually  results  along  their 
boundary.  Since  a  surface  is  continous  across  a  crease,  regions  in  the 
image  on  either  side  of  a  surface  orientation  discontiniuty  exhibit 
consistent  velocities,  but  the  velocity  gradient  usually  differ?.  In 
contrast,  regions  on  either  side  of  an  occuluding  boundary  can  have 
arbitrary  velocities. 

Horn  and  Schunck^O  recently  suggested  a  technique  for  determining 
optical  flow  in  the  restricted  case  where  the  observed  velocity  of  the 
image  irradiance  patterns  con  be  attr'buted  directly  to  the  movement  of 
surfaces  in  the  scene,  i.e.,  the  ^radiance  at  a  point  in  the  image  is 
proportional  to  the  reflectance  of  the  surface  at  the  corresponding  point 
in  the  object  (there  is  no  shading).  Under  these  circumstances,  the 
relation  between  the  change  in  the  image  irradiance  at  a  point  (x,y)  in 
the  image  plane  at  time  t  and  the  motion  of  the  irradiance  pattern  is 
given  by  the  flow  equation, 

3xu  +  ByV  +  3^  =  0  (17) 
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where  B(x,y,t)  is  the  image  irradiance,  and  u  *  dx/dt  and  v  =  dy/'dt  are 
the  optical  flow  components.  By  factoring  in  appropriate  error  terms  and 
us irig  the  calculus  of  variations,  Horn  and  Schunck  derived  an  iterative 
solution,  using  a  relaxation  procedure,  which  resulted  in  the  following 
equations , 


(k+1)  =  -{k) 

- 

(18a) 

> 

ii 

« — l 
+ 

'  ii 

(13b) 

where. 


L  -  £x;i  v  fvv  +  ft  (19a) 

M  8  X-  *  fjj  +  f2y  (19b) 

where  fx,  fy,  ft  are  the  partial  derivatives  with  respect  to  x,y,  and  t. 
u  and  v  are  the  four-ne i ghbor  local  average  velocity  components  and  X  is 
the  Lagrange  multiplier  used  in  the  smoothness  constraint. 

Parallel  implementation  of  optical  flow  is  possible  since  relaxation 
is  clearly  a  parallel  procedure.  Tnis  is  a  parallel  technique  because  the 
elements  of  the  new  approximation.  ;.(,<  +  l),  may  be  computed  simultaneously 
and  in  parallel  by  a  network  or  processors  whose  inputs  are  elements  of 
the  old  approximation,  As  suco,  it  requires  the  storage  of  bcth  the 

old  and  new  approximations. 


3.  .<NOWl;D( 


The  quality  of  segmentation  is  crucial  to  effective  machine 
understanding,  without  tne  capaoi  l*ty  of  good  segmentation  of  physical ly 
meaoingfu’  regions,  even  the  most  intelligent  processing  cannot  achieve 
satisfactory  scene  mterpretat ion .  In  the  past,  the  most  useful 
segmentation  results  have  seen  achieved  using  model  driven  techniques 
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which  were  tuned  to  work  for  specific  situations.  These  segmentation 
methods  work  well  for  very  focused  applications  such  as  finding  bright 
targets  in  low  clutter  scenes.  The  performance  of  these  segmentors  goes 
down  rapidly,  however,  when  they  have  to  deal  with  a  wider  range  of 
imagery.  There  are  many  applications  where  computer  vision  systems  have 
to  function  over  a  wide  range  of  situations.  For  example,  the  computer 
system  for  an  autonomous  land  vehicle  must  function  in  a  variety  of 
situations  which  will  be  affected  by  different  sensors,  terrains,  weather 
conditions,  and  even  the  time  of  day.  There  are  currently  no  segmentation 
methods  that  work  well  over  this  wide  variety  of  situations. 

A  way  to  improve  the  process  of  segmentation  and  make  it  robust 
enough  to  handle  a  wide  range  of  scenarios  is  to  use  Artificial 
Intelligence  (AI)  techniques  to  bring  more  knowledge  to  bear  on  the 
problem.  Using  AI  cues,  such  as  knowledge  of  the  terrain  or  weather 
conditions,  can  guide  and  control  the  segmentation  process.  This 
knowledge  can  come  from  other  sensors,  the  processing  results  from 
previous  frames,  or  even  pre-mission  training. 

Various  knowledge  driven  image  processing  techniques  have  been 
st  died.  Duane  et  a1.,3i  ljse  knowledge  driven  production  rules  to  evaluate 
tre  region  segmentation  of  an  image.  Their  evaluation  is  used  to  group 
smaller  regions  into  larger  ones.  This  evaluation  can  also  resegment  a 
region  with  new  parameters.  The  segmentation  is  performed  by  a  single 
routine  that  is  data  driven  using  no  other  knowledge.  On  the  other 
extreme  Nazlf  and  Levine^  use  production  rules  to  control  all  aspects  of 
the  segmentation  process.  Rules  control  the  analvsis  and  grouping  of 
lines,  and  regions  as  well  as  the  scheduling  of  different  segmentation 
tasks. 

An  attractive  approach  to  use  knowledge  effectively  in  the 
segmentation  process  is  to  incorporate  knowledge  driven  algorithm  modules. 
F.ach  of  the  a  loor  i  ton  modu’es  performs  a  specific  step  in  the  processing 
flow.  To  make  a  gwen  amt  Knowledge  driven,  information  that  identifies 
or  quantifies  is  included,  such  as  knowledge  about  the  type  of  sensor 
used,  the  time  of  day,  weatner  conditions,  or  actual  measurements  from  the 


196 


THE  BDM  CORPORATION 


image.  For  example,  an  algorithm  to  do  noise  cleaning  could  use  knowledge 
about  the  overall  contrast  to  determine  which  operations  to  perform. 

A  knowledge  base  needs  to  be  constructed  to  house  all  the  information 
gathered.  It  can  also  be  used  to  etain  information  that  is  acquired 
during  the  processing  chain.  In  addition,  a  method  is  required  to  control 
the  information  in  the  knowledge  base-  Production  rules,  or  "if  ...  then" 
rules,  have  proved  to  be  a  good  method  for  knowledge  based  control  within 
a  module.  A  typical  rule  is  made  up  of  an  antecedent  and  a  consequent. 

The  antecedent  consists  on  one  or  more  tests  of  information  in  the 
knowledge  base.  If  all  the  tests  in  the  antecedent  are  true  then  the 
consequent  is  executed.  The  consequent  consists  of  one  or  more  actions 
which  can  affect  the  knowledge  base  or  execute  an  image  processing 
algorithm.  For  example,  a  preprocess ing  rule  may  be:  if 
(high_freq_noise)  tnen  run  (med i an_f i 1  ter ) .  An  example  of  a  segmentation 
rule  may  be:  if  (contrast  = low )  then  run  (texture_segmentor) .  Production 
rules  makes  segmentation  knowledge  driven,  however,  it  does  provide 
increased  flexibility  and  expand ibi 1 i ty  since  production  rules  can  be 
added  or  deleted  as  need  be. 

A  knowledge  driven  architecture  makes  efficient  use  of  all  available 
knowledge  about  an  image.  It  also  provides  an  approach  for  processing 
information  across  a  multiple  of  sensory  information.  In  this  way 
information  from  different  sensors  can  be  supplied  to  the  knowledge  base 
to  be  later  retrieved  from  other  sensors. 

Finally,  knowledge  based  segmentation  can  be  designed  to  make  use  of 
information  across  multiple  frames  of  a  scene,  in  many  cases  information 
supplied  by  one  frame  can  be  used  to  help  processing  of  successive  frames. 
Lisina  this  system  arch  i  tec  i  jre ,  roles  can  be  adaptively  constructed 
throughout  the  processing  stages  to  provide  a  more  reliable  segmentation 
procedure . 
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CHAPTER  IV 

INTERMEDIATE  LEVEL  PROCESSING 

Intermediate  level  processing  attempts  to  create  symbolic  representa¬ 
tions  from  image  features  for  scene  understanding.  It  is  a  two  step 
process,  where  image  features  are  first  extracted  from  a  scene  then 
grouped  into  a  semantic  form  suitable  for  high  level  processing. 

Intermediate  level  processing  is  a  processing  step  used  to  interpret 
three  dimensional  scenes,  notably  for  robotic  vision.  However,  many  of 
the  techniques  that  will  be  examined  in  this  chapter  have  only  been 
applied  to  two  dimensional  scenes.  Specifically,  scene  classification 
relies  on  statistical  methods  designed  to  classify  non-complex  2  D  images. 
In  addition,  the  systems  which  use  classification  procedures  have  typical¬ 
ly  been  developed  for  a  narrow  range  of  applications.  They  tend  to  be  in 
essence,  "dumb"  systems.  The  major  significance  of  the  techniques  used  in 
scene  classification,  however,  is  that  they  lay  a  theoretical  framework  by 
which  scenes  that  are  more  complex  in  nature  can  be  processed  and  under¬ 
stood. 

The  classic  distinction  between  pattern  recognition  and  image  under¬ 
standing  arises  at  the  intermediate  level  of  processing.  For  the  pattern 
recognition  community,  segmentation  procedures  are  essential  for  classifi¬ 
cation  and  ultimately  scene  understanding.  However,  the  image  understand¬ 
ing  community,  in  an  attempt  to  build  general  vision  systems,  generally 
agree  that  much  r icher  representations  of  scene  surfaces  are  needed . 
Intrinsic  cnarcter  1  st ics  of  an  image  are  an  example.  It  is  not  that  the 
image  understanding  community  dismisses  the  utility  of  segmentation 
processes,  out  ratner  are  attempting  to  create  processes  that  are  knowl¬ 
edge  driven  and  amenable  to  symbolic  processing.  Regardless  of  the 
differences,  the  purpose  of  this  chapter  will  be  to  review  some  of  the 
important  algorithms  and  processes  used  in  solving  the  computer  vision 
problem. 

Tne  following  paragraphs  will  review  some  of  the  classic  pattern 
recognition  techniques  used  for  image  understanding.  Most  of  ese  tech- 
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niques  have  been  developed  for  2  D  scenes.  The  emphasis  will  shift  to 
analyzing  some  of  the  processes  used  to  represent  scenes  in  the  three 
dimensional  world  that  creates  the  2  1/2  0  sketch.  Then  a  powerful  compu¬ 
tational  algorithm  will  be  reviewed  that,  has  application  throughout  all 
levels  of  image  understanding  processing.  Finally,  implementation  using 
parallel  architectures  and  optical  processing  will  be  referenced. 

A.  SCENE  CLASSIFICATION 

Much  of  the  work  in  computer  vision  in  the  two-dimensional  world, 
including  document  processing,  radiology,  industrial  automation,  remote 
sensing,  tracking,  etc.,  has  developed  because  of  the  advances  in  pattern 
recognition  techniques.  These  techniques  have  been  characterized  into  two 
general  forms:  statistical  (decision-theoretic)  and  syntactic  (struc¬ 
tural).  Both  of  these  methods  attempt  to  match  feature  characteristics  of 
a  scene  with  2  D  mode  based  on  some  knowledge  representation. 

1 .  Statistical  Pattern  Recognition 

The  two  general  approaches  in  statistical  pattern  recognition 
are  correlation  and  feature  extraction.  Correlation  is  the  process  of 
comparing  an  input  image  against  a  reference  (template)  image  for  all 
possible  positions  in  the  input  field  of  view.  Since  a  template  match  is 
rarely  ever  exact,  because  of  noise,  quantization  effects,  etc.,  a 
distance  measure,  D(m,n),  is  corononly  used.  If  we  represent  the  image 
field  as  F  ( j ,  k ) ,  and  the  template  as  T(j,k),  then  the  mean-square  differ¬ 
ence  or  error  can  be  defined  as, 

D(m,n)  =  (F(j,k)  -  T(j-m,k-n})-  (20; 


A  normalized  cross  correlation  can  be  derived  from  Equation  (20)  that 
makes  tne  correlation  invariant  on  position  such  that,  a  template  match  is 
said  to  exist  if  NC(m,n) ^  T(m,n) ,  where  T(m,n)  is  a  threshold  level.  One 
of  the  major  limitations  of  template  matching  is  that  an  enormous  number 
of  templates  must  be  matched  to  account  for  changes  in  rotation  and 
magnification  of  template  objects. 
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Parallel  implementation  of  simple  template  matching  using  corre¬ 
lation  has  been  proposed  on  the  GAPP  system.7  The  correlation  of  a  binary 
image  is  performed  in  a  similar  fashion  as  convolution  except  that  bit- 
wide  exclusive  OR  operations  are  used  instead  of  multiplications. 

The  use  of  optical  processing  in  pattern  recognition/classifica¬ 
tion  has  become  more  of  a  reality  over  the  past  few  years.  Optical 
pattern  recognition  (OPR)  has  always  offered  the  advantages  of  high  speed 
and  parallel  processing  with  the  basic  linear  systems  operations  of 
Fourier  transformations  and  correlation  being  its  hallmark.  Research  in 
recent  years  has  considerably  broadened  this  repertoire.  Because  of  the 
introduction  of  sophisticated  optical  hardware  and  improved  optical 
engineering  techniques,  many  OPR  prototype  systems  are  now  being  devel¬ 
oped. 

The  correlation  method  was  one  of  the  first  techniques  used  for 
optical  pattern  recognition.  The  schematic  in  Figure  52  shows  the 
process  to  obtain  the  correlation  function  in  an  optical  processing 
environment.  The  input  image  f(x,y)  is  given  at  plane  PI.  The  output  at 
plane  P3  is  the  Fourier  transform  of  the  product  of  the  Fourier  transform 
F(u,v)  of  the  input  function  and  the  filter  function  H*(u,v)  (complex 
conjugate  of  H)  of  plane  P2.  This  can  be  expressed  as, 


s(x,y)  =?^F(u,v)  H*(u,v)^  = 


f»h 


where®  means  correlation.  The  output  s(x,y)  at  plane  P3  is  found  to  be 
f ( -x , -y )  or  an  upside  down  and  reversed  image  of  the  input  function 
f(x,y).  This  results  because  an  optical  system  can  only  perform  forward 
not  reverse,  Fourier  transforms.  However,  since  h(x,y)  >s  real  and 
H*( u , v )  =  H(-u,-v)  then  the  correlation  is  seen  to  be  equivalent  to  a  con¬ 
volution  without  the  coordinate  reversal  of  one  of  the  functions. 

We  can  implement  the  correlation  by  holographically  recording 
the  pattern  on  a  transparency  at  plane  P2  and  making  it  proportional  to 
the  reference  object  to  be  matched.  This  results  in  a  matched  spatial 
filter  (MSF).  The  presence  of  a  peak  of  light  in  the  output  plane  at  P3 
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Figure  52.  Schematic  Diagram  of  an  Optical  Pattern 
Recognition  Correlator 
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indicates  that  the  reference  object  is  present  in  the  input  image,  and  the 
location  of  the  peak  denotes  where  the  reference  object  is  in  the  input 
scene.  Since  the  system  in  Figure  52  is  linear  and  shift-invariant, 
superposition  holds  and  thus  the  system  is  capable  of  recognizing  multiple 
occurences  of  an  input  object. 

A  matched  filter  is  deterministic  in  nature.  It  does  not 
include  statistical  features  of  the  patterns  to  be  classified.  In  parti¬ 
cular,  the  matched  filter  is  sometimes  too  sensitive  to  differences 
between  patterns  which  are  required  to  be  grouped  together  as  one  class  of 
objects.  Many  times  it  incorrectly  matches  patterns  that  look  alike  but 
are  quite  different.  The  problem  is  to  find  unique  features  so  that  the 
interclass  variations  of  one  statistically  distributed  object  becomes  a 
minimum,  and  the  seperation  of  different  classes  a  maximum.  Refer  to 
Figure  53  .  For  completeness,  it  should  be  mentioned  that  there  has  been 
much  work  in  the  area  of  matched  filtering  for  stochastic  image  fields. 

The  second  technique  used  in  pattern  recognition  is  feature 
extraction.  Feature  extraction  can  be  described  as  a  process  of  obtaining 
a  feature  vector,  which  consist  of  a  set  of  scalar  features,  implementing 
a  feature  extractor  and  finally  using  a  classification  scheme.  The 
features  can  include:  geometrical  features,  Fourier  coefficients,  Mellin 
coefficients,  moments,  etc.  The  feature  extractor  attempts  to  extract  the 
most  prominent  features,  which  usually  involves  projection  of  the  measured 
feature  vector  onto  one  or  several  discriminant  vectors  by  a  vector  inner 
product  operator.  Finally,  the  classifier  then  determines  the  object 
class  from  the  projection  value(s)  obtained.  Some  typical  classifiers 
are:  minimum  distance,  nearest  neighbor,  minimum  probability  of  error, 
etc.  The  effectiveness  of  these  methods  usually  depend  on  the  type  of 
imagery,  noise  and  clutter,  target-type,  speed  requirements,  hardware 
limitations,  etc. 

Two  techniques  mentioned  for  constructing  the  feature  vector 
were  moments  and  Fourier  boundary  descriptors.  The  geometric  moments  of 
an  input  object  f(x,y)  are  defined  by, 
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Figure  53.  Classification  Problem  in  OPR 
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“  [{  '  t  (22) 

-a»  %oo 

and  the  feature  vector  for  an  object  in  class  i  is  denoted  by  Mi  (its 

elements  are  the  Mpg  in  Equation  22).  In  a  digital  environment,  Equation 

(22)  would  be  expressed  as, 

P  & 

=  Z  Z  *  h  (23) 

<  i 

where  f(x,y)  equals  zero  or  unity  for  a  binary  image.  The  moments  of  a 
binary  image  are  found  by  digitally  summing  over  the  image  pixels  weighted 
by  the (xPyQ) monomial  masks.  In  addition,  a  set  of  moment  equations  can  be 
derived  that  are  invariant  under  translation,  rotation,  and  magnification. 
These  invariant  moments  can  be  used  in  the  classification  and  matching 

process  irrespective  of  the  object's  spatial  location,  rotation,  and/or 

magnification. 


It  is  possible  to  envision  a  parallel  architecture  to  perform 
moment  calculation.  After  the  pixels  that  comprise  the  object  boundary 
are  known,  then  the  moments  of  the  image  can  be  found  by  computing  the 
moments  in  each  processing  element  separately  and  then  summing  across  the 
processing  elements  using  recursive  doubling. 33  This  scheme  requires  that 
each  PE  know  its  absolute  position  in  the  configuration  since  the  weight¬ 
ing  of  one  of  the  moments  in  each  PE  is  dependent  upon  the  the  PE  address. 

The  moments  can  also  be  calculated  optically  on  the  system  in 
Figure  54  .  With  different  monomial  masks  k(x,y)  =  (xPyd)  present  on 
different  spatial  frequency  carriers  at  P2,  then  the  output  pattern  is 
simply, 

{  f(x,y)  k ( x , y )  dx  dy  (24) 


which  corresponds  to  the  moments  of  the  PI  input  f(x,y),  each  located  at  a 
spatially  different  position  in  P3.  Researchers^  have  developed  ways  to 
synthesize  these  masks. 


204 


□L 


THE  BOM  CORPORATION 


The  moments  calculated  up  to  a  particular  order  make  ud  the 
feature  vector.  The  feature  extractor  attempts  to  extract,  the  most 
important  features  of  a  class.  The  feature  extractor  consists  of  trans¬ 
forming  the  observed  moments,  M'  with  the  scaling  vector,  w't,  into  a 
Fischer  space  by, 


M  =  wt  M'  (25) 

which  reduces  the  number  of  features  for  easier  discrimination.  Estimates 
of  class  i  of  the  input  object  are  obtained  from  Fischer  projections.  A 
second  level  classifier  is  then  used,  i.e.,  the  Mahalanobis  distance,  to 
minimize  the  distance  between  the  input  vector,  M ' ,  and  the  reference 
vectors,  Mi,  for  each  class  i. 

The  Fischer  linear  discriminant  function  has  been  used  success¬ 
fully  in  pattern  classification.  It  is  a  linear  function  that  provides 
the  maximum  ratio  of  between-class  scatter  to  within-class  scatter.  For 
example,  suppose  that  we  want  to  classify  a  pattern  into  either  class  i  or 
j,  then  it  is  possible,  using  Fischer's  linear  discriminant,  to  partition 
she  feature  space  into  two  regions  by  a  hyperplane  decision  surface.  If 
this  surface  is  positioned  correctly,  it  provides  a  simple  method  for 
classifying  patterns  into  one  of  two  classes. 

There  has  been  research  into  implementing  Fischer's  linear  dis¬ 
criminant  function  using  a  systolic  array. 35  They  have  handled  the  class¬ 
ification  problem  for  a  more  real-worlo,  complex  situation  where  there  are 
more  than  two  classes  to  discriminate. 

Finally,  a  hybrid  optical /dig i tal  architecture  has  been  sugges¬ 
ted  to  perform  the  classification  procedure.  Excellent  performance,  over 
90%  correct  class  recognition,^  has  been  obtained  with  tnis  parallel 
approach.  Researchers  are  now  in  the  process  of  prototyping  this  concept. 

‘curie''  ocl ■nca'-y  cesc'-iptors  have  a '  so  teen  used  as  a  feature 
vector  in  the  classification  process.  The  Fourier  descriptors  can  be 
shown  to  be  closely  related  to  moments  through  the  joint  characteristic 
function.  The  Fourier  descriptors  are  determined  from  the  outline  or  con- 
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tour  of  the  input  image.  Obviously  the  scene  in  the  image  would  already 
have  to  be  segmented  in  order  to  arrive  at  its  contour. 

To  determine  the  Fourier  descriptors,  the  contour  line  of  some 
region  or  scene  is  described  as  a  complex  function  of  arch  length  t  as 
depicted  in  Figure  65  so  that, 

z(t)  =  Re[z(t)]  +  i  Im[z(t)}  (26) 


This  function  is  periodic  with  respect  to  its  perimeter  T  and  band  limited 
because  of  its  finite  number  of  sampling  points.  Therefore  it  can  be 
approximated  by  a  Fourier  series  with  (N+l)  coefficients, 

■f'- 

40=1--  H  C_  £  (27) 


with  bJ  =  2 TT/T .  The  Cm  are  defined  by. 


/i 


(28) 


A$  the  values  z(t)  are  complex  numbers  the  positive  and  negative  Fourier 
coefficients.  Cm,  are  independent  and  can  be  represented  as, 


i  r 
i  U 


( 2S ) 


It  can  te  snown  that  a  set  of  Fojrier  descriptors  can  be  derived 
from  a  truncated  set  of  Fourier  coefficients. 
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which  are  invariant  with  resoect  to  location,  size, 
Fenner  descriptors  can  o-  used  as  a  set  of  features 


and  rotation. --6 
in  toe  class'.*1-; 


The 
1 1  o  n 


process 


The  Fourier  descriptor  calculations  are  based  on  Fourier 
transform  theory  and  consequently  has  the  potential  of  being  imple- 
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rented  in  a  parallel  environment.  The  differences  would  reside  in 
programming  the  PE's  to  calculate  the  invariant  Fourier  descriptors. 

The  calculation  of  the  Fourier  coefficients  results  in  a  large 
number  of  features  since  the  number  of  Fourier  coefficients  is  propor¬ 
tional  to  the  space  bandwidth  product  of  the  input  image,  which  can  typi¬ 
cally  approach  2.5  x  108.  Usually  one  is  forced  to  use  a  technique  which 
reduces  the  number  of  features  for  easier  classification.  Researchers  in 
optical  processing  have  had  success  with  the  wedge  ring  detector  (WRD) 
sampling  system  for  reducing  the  number  of  features  for  the  classification 
process.  The  details  of  the  WRD  can  be  found  i n.34  Experiments  have  shown 
reductions  from  a  bandwidth  of  2.5  x  108  to  64  features.  Then  a  Karhunen- 
Loeve  transformation  can  be  used  to  provide  an  efficient  intra-class  dis¬ 
crimination  for  feature  extraction,  while  some  type  of  discrimant  function 
can  be  used  for  final  classification. 

Another  process  used  for  classification  has  been  the  Hough 
transform.  Originally  the  Hough  transform  was  used  to  detect  lines37  and 
later  was  used  to  detect  curves  that  could  be  expressed  in  an  analytic 
form,  for  example,  circles,  ellipses,  etc.  However,  researchers38 
extended  the  application  to  detecting,  segmenting  and  classifying  arbi¬ 
trary  shapes  in  the  analysis  of  real  images.  Given  an  arbitrary  shape,  S, 
the  generalized  Hough  technique  provides  a  mapping  from  the  orientation  of 
an  edge-element  to  the  set  of  instances  of  S,  modified  by  location,  rota¬ 
tion,  and  uniform  scaling,  which  could  have  given  rise  to  that  edge 
element.  Tms  mapping  allows  all  local  evidence  for  a  particular  instance 
of  S  to  contribute  to  global  decisions  about  the  figure.  In  practice,  the 
Hough  transform  is  similar  to  implementing  a  generalized  matched  filtering 
strategy,  that  is,  template  matching. 

The  algorithm  for  the  generalized  Hough  transform  can  be  gener¬ 
ally  described  as  constructing  a  rarameter  space  from  an  image  space, 
where  the  parameter  space  is  used  to  locally  describe  the  shape.  Tne 
generalized  Hough  transform  can  also  accomodate  translation,  rotation,  and 
scale  variations  in  images.  A  attractive  feature  of  this  transform  is 
tnat  it  will  work  even  when  the  boundary  is  disconnected  due  to  noise  o 
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occulsions.  This  is  generally  not  true  for  strategies  which  track  edge 

segments.  Finally,  the  generalized  Hough  transform  can  be  implemented  in 

para! lei . 

The  Hough  algorithm  for  detecting  and  classifying  arbitrary 

shapes,  with  fixed  orientation  and  scale,  can  be  described  as  follows. 


Given  an  arbitrary  shape,  depicted  in  Figure 
the  boundary  with  the  gradient  direction,  <£ , 


56  ,  then  for  each  point  on 

—9  -a  ->> 

increment  a  point  a  3  x+r. 


Solving  for  r  we  get  r  =  a-x.  The  fact  that  r  varies  in  an  arbitrary  way 


means  that  the  generalized  Hough  transform  for  an  arbitrary  shape  is  best 


represented  by  a  table  called  the  R-table.  The  R-table  is  constructed  by 


first  choosing  a  reference  point,  y,  for  the  shape.  Then  for  each 
boundary  point,  x,  compute  *?(x),  the  gradient  direction,  and  r  3  y-x. 
Store  r  as  a  function  of  The  algorithm  can  be  stated  as: 


(1)  construct  a  R-table  for  the  shape  to  be  matched 

(2)  for  each  edge  point  (using  an  edge  operator  that  supplies  direc¬ 
tion)  in  the  image 

(a)  compute  ^(x’) 

(b)  calculate  the  "predicted"  reference  points 

(c)  increment  an  accumulator  array,  A(y)  =  A(y)  +  1 

(3)  maxima  in  the  accumulator  array  indicates  the  translation 
Rotation  and  scaling  parameters  can  also  be  included  by  adding 

these  two  parameters  to  the  shape  description.  This  results  in  making  the 
accumulator  array  increase  to  four  dimensions  correspond i ng  to  the  param¬ 
eters  (y,s,phi)  and  an  increase  in  the  information  in  the  R-table.  Many 
other  improvements  and  variations  can  be  added  to  the  generalized  Hough 
transform  to  consider  it  as  a  valuable  computational  tool  for  classifica¬ 


tion  and  intermediate  level  processing. 

Finally,  the  Hough  transform  has  the  potential  of  being  imple¬ 
mented  in  a  massively  parallel  computing  network.  The  PIPE  system-^  has 
been  used  to  perform  the  -rough  transform.  The  idea  is  that  all  tne 


"voting"  (i.e.  increment ing  the  accumulator  array)  can  be  hardwired  to  be 
performed  in  one  step. 
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Figure  56.  Geometry  for  the  Generalized  Hough  Transform 
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Researchers4^  have  also  investigated  using  a  neural  network  to 
solve  this  highly  computational  problem.  The  idea  is  to  transform  from  a 
complex  information  flow  of  voting  to  a  complex  wiring  architecture  carry¬ 
ing  simple  excitatory  and/or  inhibitory  responses. 

Finally,  there  has  been  a  proposal  to  implement  a  projection 
processor4!  to  calculate  the  Hough  transform  in  an  optical  environment. 

2 .  Syntactic  Pattern  Recognition 

In  many  situations,  implementation  of  statistically  based 
pattern  recognition  techniques  for  an  image  understanding  sys*:n  Hoes  have 
the  tendency  to  become  computationally  intensive.  As  the  image  (pattern) 
to  be  analyzed  becomes  increasingly  complex,  the  number  of  patterns,  each 
with  a  n-dimensional  feature  vector,  grows.  The  syntactic  approach  to 
pattern  recognition  attempts  to  solve  this  problem  by  representing  a  com¬ 
plex  pattern  by  its  simpler  subpatterns  (pattern  primitives)  and  then 
apply  the  appropriate  statistical  methods  to  the  subpatterns. 

In  syntactic  methods,  a  pattern  is  represented  by  a  sentence  (a 
string  or  a  tree)  in  a  language  specified  by  a  granmar.  The  language, 
which  is  used  to  describe  the  structure  of  the  patterns,  is  referred  to  as 
the  pattern  description  language.  The  rules  governing  the  composition  of 
primitives  into  patterns  are  specified  by  a  pattern  gramnar.  After  each 
primitive  within  the  pattern  is  identified,  the  recognition  process  is 
accomplished  by  performing  a  syntax  analysis,  or  parsing,  of  the 
"sentence"  describing  the  given  pattern  to  determine  whether  or  not  it  is 
syntactical ly  (or  gramat ica 1 1 y )  correct  with  respect  to  the  specified 
gramnar.  It  is  common  to  perform  some  preprocess ing  and  segmentation 
operations  on  the  raw  image  data  before  the  parsing  procedure  is  imple¬ 
mented  . 

The  syntactic  approach  to  pattern  recognition  provides  a  capa¬ 
bility  for  describing  a  large  set  of  complex  patterns  by  using  small  sets 
of  simple  pattern  primitives  and  grammatical  rules.  Also,  one  of  t.ne  more 
attractive  aspects  is  the  recursive  nature  of  a  grammar.  A  grammar  rule 
can  be  applied  any  number  of  times,  so  it  is  possible  to  express  in  a  very 
compact  way  some  basic  structural  characteristics  cf  a  large  domain  of 
sentences . 
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An  alternative  way  to  represent  the  structural  information  about 
a  pattern  is  to  use  a  relational  graph.  A  relational  graph  is  used  to 
describe  a  scene  in  more  detail,  where  relations  between  various  subpat¬ 
terns  and  primitives  are  specified.  In  a  relational  graph  the  nodes 
represent  subpatterns  and  branches  represent  relations  among  subpatterns. 
A  relational  graph  with  such  indications  is  called  a  directional  graph. 

The  two  key  areas  of  importance  in  syntactic  pattern  recognition 
is  the  selection  of:  1)  subpatterns  and  2)  trees  or  relational  graphs 
along  with  an  efficient  procedure  for  analyzing  tree  and  graph  structures. 
For  line  patterns  or  patterns  described  by  boundaries  or  skeletons,  line 
segments  are  often  suggested  for  pattern  primitives.  For  example,  a  line 
segment  can  be  characterized  by  the  locations  of  its  beginning  (tail)  and 
end  (head),  by  its  length,  and/or  its  slope.  A  curve  segment  can  be 
described  in  terms  of  its  length  and  curvature.  Finally,  shape  and  tex¬ 
ture  have  been  used  as  pattern  primitives  for  describing  regions. 

After  pattern  primitives  are  selected,  the  next  step  in  describ¬ 
ing  the  pattern  is  the  construction  of  a  grammar.  As  mentioned  previ¬ 
ously,  grammar  is  used  to  generate  language.  Unfortunately,  as  the 
descriptive  power  of  a  language  is  increased  so  is  the  complexity  of  the 
syntax  analysis  system  (i.e.  recognizer  or  acceptor). 

Finally,  in  relation  to  practical  applications,  stochastic 
languages  and  error-correcting  parsing  have  been  suggested. Stochastic 
languages  are  used  to  resolve  the  uncertainties  that  arise  due  to  the 
presence  of  noise  and  variation  in  the  pattern  measurements,  segmentation 
error,  and  primitive  extraction  error.  Generally,  stochastic  languages 
entail  associating  probabilities  with  grammar  rules.  For  example,  if  a 
sentence  is  found  to  be  generated  by  two  different  pattern  grammars,  the 
ambiguity  can  be  resolved  by  comparing  the  probabilities  of  the  sentence 
in  the  two  grammars.  Then,  a  maximum-likelihood  or  Bayes  decision  rule 
will  yield  the  final  recognition. 

Error-correct i ng  parsing  handles  noisy  and  distorted  patterns  by 
using  similarity  measures.  Similar  to  the  template  matching  method  in 
statistical  pattern  recognition,  a  similarity  or  distance  measure  can  be 
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defined  between  a  sentence  representing  an  unknown  pattern  and  a  sentence 
describing  a  prototype  pattern.  Recognition  of  the  unknown  pattern  can  be 
carried  out  on  the  basis  of  the  maximum  similarity  or  minimum  distance 
criterion. 

A  general  procedure  using  primitives  for  shape  description  has 
been  suggested.  For  example,  if  a  texture  pattern  can  be  modeled  as  an 
arrangement  of  primitive  patterns  (deterministic  or  stochastic),  then  a 
syntactic  method  can  be  applied  to  texture  modeling  and  discrimination. 
Gray  level  values,  1,  of  either  single  pixels  or  a  homogeneous  group  of 
pixels  (NxN)  may  serve  well  as  the  primitives.  From  a  structural  point  of 
view,  texture  is  the  placement  of  structured  subpatterns.  To  account  for 
this  characteristic,  pictures  are  divided  into  fixed-size  windows.  Refer 
to  Figure  57  .  A  grammar  is  then  used  to  characterize  the  windowed 
pattern  (KxK)  of  the  given  texture.  Assuming  that  the  window  is  of  size 
(KxK),  there  are,  lk2,  possible  patterns.  The  set  of  all  the  windowed 
patterns  of  a  particular  texture  is  a  subset  of  the  patterns.  A  high 
dimensional  regular  grammar,  for  example,  a  tree  grammar,'  is  suitable  for 
this  characterization  of  textured  patterns. 

To  create  the  tree  representation,  each  pixel  in  a  (KxK)  window  is 
made  to  correspond  to  a  node.  Hence,  a  pattern  primitive  becomes  the 
assigned  label  to  its  correspond ing  node.  A  tree  structure  can  be  used  to 
define  the  window  and  the  labels  on  the  nodes  are  used  to  define  the 
pattern.  The  tree  grammar  can  then  be  formulated  that  will  generate  the 
tree  representat i on .  Obviously,  the  choice  of  the  tree  structure  deter¬ 
mines  the  complexity  as  well  as  the  effectiveness  of  the  constructed  tree 
grammar . 

An  example  of  a  noise-free  pattern  is  snown  in  Figure  58  .  The  pat¬ 
tern  in  (a)  has  the  tree  representat  ion  (b)  for  the  tree  structure  model 
in  (c).  The  tree  grammar,  G,  used  to  generate  the  tree  representation  in 
Figure  58  is  given  in  Figure  59  .  Vy  is  the  set  of  terminal  and  non¬ 
terminal  symbols,  r  is  the  rank  associated  with  the  symbols  in  Vy,  Py  is 

the  set  of  production  rules,  Ay  is  the  starting  symbol,  and  Vy  is  the  set 
of  terminal  symc 
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59.  Tree  Grammar  for  a  Noise-Free  Pattern  in  Figure  58 
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Since  the  parsing  procedure  for  recognition  is,  in  general,  non¬ 
determini  stic,  it  is  regarded  as  computationally  inefficient.  Most  of  the 
early  improvements  have  been  in  developing  efficient  sequential  proce¬ 
dures.  Of  course,  when  a  sequential  procedure  is  used,  the  parsing  proce¬ 
dure  stops  most  of  tne  time  before  a  sentence  is  completely  scanned  which 
can  prevent  recovering  complete  structural  information  of  the  pattern. 
Another  approach  to  reduce  the  parsing  time  is  the  use  of  parallel  proces¬ 
sing. 

Researchers^  have  implemented  Earley's  parsing  algorithm  in 
parallel.  Earley's  algorithm  is  a  non-backtracking  tabular  parsing  proce¬ 
dure  used  for  context-free  languages.  They  were  able  to  reformulate 
Earley's  algorithm  from  a  sequential  to  parallel  form.  Then  they 
constructed  an  architecture  to  implement  this  parallel  algorithm  which 
operated  in  a  pipelined  and  parallel  fashion  using  VLSI  technology. 
Earley's  algorithm  has  had  application  in  shape  recognition. 

B.  THE  2  1/2  D  SKETCH 

The  2  1/2  D  sketch  attempts  to  represent  complex  three-dimensional 
objects  in  a  scene.  This  is  a  classification  procedure,  where  now  three 
dimensional  information  is  used  to  separate  classes  of  objects  within  an 
image.  Intrinsic  surface  characteristics  such  as  depth,  shading,  orienta¬ 
tion,  and  texture  are  properties  used  to  recover  regions  of  homogeneous 
groupings  corresponding  to  three  dimensional  representations.  The 
grouping  process  produces  a  symbolic  form  of  the  scene  which  later  can  be 
used  for  higher  level  processes,  such  as  matching  and  scene  interpreta¬ 
tion  . 

The  explicit  representation  of  intrinsic  surface  characteristics 
marks  a  critical  transition  from  pictorial  information,  created  from  the 
traditional  2-0  pattern  recognitor  techniques,  to  information  about  the 
scene  itself.  Understanding  the  computational  process  by  which  intrinsic 
scene  characteristics  are  recovered  from  an  image  has  been  a  major  focus 
of  research  by  the  artificial  intelligence  community  interested  in  image 
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understanding  for  computer  vision.  The  following  paragraphs  will  discuss 
this  in  some  detail . 

It  may  be  questioned  on  why  I  have  chosen  to  incorporate  intrinsic 
character i sties  of  the  scene  as  part  of  intermediate  level  processing  and 
not  part  of  early  level  processing.  The  reason  for  this  approach  if  .hat 
the  intrinsic  characteristics  correspond  to  a  3  D  world  and  therefore  pro¬ 
vide  a  richer  description  of  a  scene  than  a  two-dimensional  representa¬ 
tion.  In  addition,  these  intrinsic  characteristics  have  a  propensity  to 
segment  and  classify  imagery  all  in  one  step  because  of  the  implicit 
information  that  they  supply.  Finally,  processing  on  the  intrinsic  prop¬ 
erties  of  an  image  is  similar  to  the  way  a  human  gains  knowledge  about  a 
scene.  That  is,  many  top-down,  high-level  assumptions,  such  as  depth  and 
orientation,  are  made  when  a  human  interprets  a  scene.  These  assumptions 
are  based  on  previous  learned  experiences  about  the  way  the  world  is 
structured.  The  human  visual  system  tends  to  utilize  the  intrinsic  cues 
in  the  form  of  symbolic  reoresentations  as  an  inference  base  for  knowlege 
planning.  Since  the  intrinsic  character istics,  which  produce  the  2  1/2  0 
sketch,  provide  such  a  rich  description  of  a  scene  they  will  be  catego¬ 
rized  as  intermediate  level  processing. 

The  primary  intrinsic  images  are  surface  reflectance,  surface  orien¬ 
tation,  and  incident  illumination.  Others  include  range,  transparency , 
and  specularity.  The  distance  and  orietation  images  corresporu  to  a 
representation  that  was  formally  proposed  by  Marr.l  called  the  2  1/2  0 
sketch.  The  distance  image  gives  the  range  along  the  line  of  sight  from 
the  center  of  projection  to  each  visible  point  in  the  scene.  The  orienta¬ 
tion  image  gives  a'vector  representing  the  direction  of  the  surface  normal 
at  every  point.  The  reflectance  image  gives  the  albedo  (i.e.  the  ratio  cf 
total  relected  to  total  incident  illumination).  Finally,  the  illuimina- 
tion  image  gives  the  integrated  incident  illumination  from  all  sources. 

A  simple  example  of  intrinsic  images  and  their  usefulness  in  image 
understanding  for  a  computer  vision  system  can  be  described  from  experi¬ 
ments  with  a  scanning  laser  rangefinder,  which  measures  the  properties  of 
distance  (depth)  and  apparent  reflectance.  Since  range  data  is  uncorrup¬ 
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ted  by  reflectance  variations,  and  the  amplitude  data  is  unaffected  by 
ambient  lighting  and  shadows,  extracting  surfaces  which  are  planar  or  have 
uniform  reflectivity  becomes  an  easier  task.  Such  tasks  are  usually  very 
difficult  to  perform  reliably  in  gray  level  imagery,  but  with  pure  range 
and  amplitude  data,  segmentation  techniques,  such  as  thresholding  and 
region  growing,  perform  well.  Also,  with  the  addition  of  some  logic, 
relative  to  the  information  provided  by  the  rangefinder,  segmentation 
algorithms  can  be  used  to  segment  images  into  surfaces  where  reflectivity 
is  constant  and  range  is  continuous.  This  is  an  important  step  for  any 
robotic  vision  understanding  system. 

An  image  understanding  system  for  machine  vision  requires  the  ability 
to  interpret  three-dimensional  scenes  from  a  two-dimensioanl  representa¬ 
tion.  The  grey  level’  intensity  values  (shading)  that  represent  a  scene 
can  be  used  to  determine  through  dimensional  surface  shape.  Simply 
stated,  Horn‘d  has  devised  a  relationship  between  shape  and  intensity,  I, 
at  a  point  (x,y)  in  an  image  as, 

I ( x , y )  =  R(p,q)  (31) 


where, 


p=<5f/£x,q  =  &f  Ay  (32  ) 

which  he  called  the  image  irradiance  equation.  Refer  to  Figure  60 
This  is  a  nonlinear  first-order  partial  differential  equation.  The 
function  -R(p,q)  includes  information  such  as  position  of  the  viewer, 
distribution  of  light  sources  (assumed  to  be  fixea),  and  tne  reflectance 
characteristics  of  the  surface  material.  For  a  fixed  distribution  of 
light  sources  and  fixed  reflectance  characteristics,  the  image  irradiance 
equation  associates  a  or i got ness  value  with  each  surface  orientation. 
That  is,  a  brightness  value  is  assigned  to  each  point  in  the  pg-  gradient 
space.  The  gradient  refers  to  the  orientation  of  a  physical  surface. 
Distance  from  tne  origin  of  the  gradient  space  equals  tne  slope  of  the 
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surface,  while  the  direction  is  the  direction  of  the  steepest  descent. 
Ti.is  representation  has  been  called  the  reflectance  map. 

Horn  has  applied  the  characteristic  strip  method  for  solving  Equation 
(31)  directly.  The  procedure  to  solve  this  nonlinear  first-order  partial 
differential  equation  is  to  set  up  an  equivalent  set  of  five  ordinary 
differential  equations,  three  for  the  coordinates  and  two  for  the  compon¬ 
ents  of  the  gradient.  These  five  equations  can  be  integrated  numericallly 
along  certain  curved  paths.  The  curve  traced  out  on  the  object  in  this 
manner  is  called  a  characteristic.  Then  the  process  can  be  repeated  for  a 
series  of  steps  in  a  specific  direction.  The  shape  of  the  surface  is  thus 
given  as  a  sequence  of  coordinates  on  character istics  along  its  surface. 
Two  disadvantages  of  this  direct  approach  is  that  it  is  sensitive  to  noise 
and  che  base  characterist  ic  is  data-dependent  and  thus  it  is  difficult  to 
determine  the  initial  starting  points. 

An  iterative  technique  has  been  developed  by  Ikeuchi  and  Horn44  to 
solve  this  equation.  Generally,  they  formulated  the  system  into  a  varia¬ 
tional  problem  with  consistency  (error:  in  gradients  around  a  close  loop) 
as  a  penalty  term.  Their  method  is  less  sensitive  to  noise  and  is  also 
amenable  to  parallel  implementation. 

As  suggested  before,  texture  is  a  common  property  used  to  distinguish 
surfaces.  Similar  to  the  process  by  which  the  human  visual  system  oper¬ 
ates,  texture  elements  of  varying  size  in  an  image  are  commonly  perceived 
as  a  three  dimensional  surface.  The  methods  by  which  one  determines  sur¬ 
face  orientation  from  this  process  is  referred  to  as  texture  gradient 
techniques.  Gibson^  was  one  of  tne  first  investigators  who  used  texture 
gradients  in  determining  surface  orientation.  The  general  idea  is  that  if 
the  t-xture  image  is  segmented  into  primitives,  the  maximum  rate  of  cnange 
o'  the  projected  size  of  these  primitives  constrains  the  orientation  of 
the  plane.  The  direction  of  maximum  rate  of  change  of  projected  primitive 
size  is  tne  direct1  on  of  the  texture  gradient.  These  ideas  are  related  to 
the  gradient  space  discussed  before. 

W i t k i n A6  has  developed  a  computational  approach  to  recover  surface 
shape  and  orientation  from  the  texture  gradient.  Aitkin's  approach  was 
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novel  in  that  he  developed  a  procedure  using  texture  gradients,  similar  to 
that  proposed  by  Gibson, 45  and  others;47  however,  he  relaxed  the  con¬ 
straints  that  defined  the  texture  geometry  so  that  his  algorithm  could  be 
applied  to  a  wide  variety  of  natural  scenes.  In  general,  his  technique 
recovers  shape  from  texture  contours  in  an  image. 

The  general  procedure  that  Witkin  used  to  estimate  the  orientation  of 
a  surface  was  to  develop  a  series  of  geometric  and  statistical  models.  A 
geometric  model  was  used  to  provide  a  relation  between  the  orientation  of 
a  texture  primitive  on  a  surface,  the  tangent  to  a  curve  on  the  surface, 
and  the  corresponding  tangent  in  the  image.  Then  he  derived  a  statistical 
model  that  expressed  the  expected  distribution  of  tangents  on  a  surface 
and  the  corresponding  distribution  after  projection.  Finally,  an 
estimator  was  derived  that  expressed  the  probability  density  function  for 
surface  orientation  determined  by  the  geometric  and  statistical  models. 

Witkin's  first  implementation  was  on  planar  surfaces,  where  the 
entire  image  surface  orients  itself  uniformly.  He  used  the  D2G  operator 
to  obtain  the  texture  contour  points  and  then  grouped  the  tangent  direc¬ 
tions  into  a  histogram.  An  estimator  was  used  to  judge  the  maximum  like¬ 
lihood  for  surface  orientation. 

Witkin  also  applied  his  algorithm,  with  some  modifications,  to  curved 
surfaces.  However,  rather  than  computing  the  tangent  direction  across  the 
image,  a  local  distribution  was  computed  on  a  circular  region  surrounding 
each  image  point.  Witkin  had  much  success  in  both  applications. 

C .  Relaxation 

The  purpose  of  a  computer  vision  system  is  to  produce  a  description 
of  an  image  similar  to  that  produceo  from  a  skilled  observer.  This 
requires  application  of  many  c*  the  image  processing  and  pattern  recogni¬ 
tion  techniques  already  discussed.  Some  of  them  include  image  bandwidth 
compression,  image  restoration  and  enhancement,  extraction  of  homogeneous 
regions,  measuring  features  that  characterize  and/or  classify  these  seg¬ 
ments,  and  measuring  relations  between  these  segments  (i.e.  brighter 
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than,  smaller  than,  etc.).  The  output  of  this  complex  sequence  of  events 
is  far  from  being  an  array  of  gray  level  intensities  that  was  initially 
captured  by  the  imaging  system.  Rather,  the  output  has  been  transformed 
into  a  representation  that  is  more  appropriate  for  high  level  processing), 
that  is,  a  symbolic  description  such  as  a  labeled  graph  or  semantic  net¬ 
work. 

In  addition  to  the  information  that  is  extracted  from  the  scene,  we 
also  have  supplied  to  us  knowledge  about  the  type  of  scene  that  we  wish  to 
describe  (the  model).  This  knowledge  representation  can  also  be  trans¬ 
formed  into  a  labeled  graph.  The  solution  then  becomes  one  of  matching 
the  graph  network  of  the  scene  to  the  graph  network  of  the  model.  This 
global  matching  procedure  is  called  relaxation,  since  it  resembles  an 
iterative  procedure  in  numerical  analysis.  The  use  of  relaxation  is 
considered  by  many  as  a  powerful  algorithmic  tool  for  symbolic  image 
description  and  image  understanding  throughout  all  levels  of  processing. 
Some  examples  of  where  relaxation  has  been  used  is  edge  detection,  shape 
from  shading,  optical  flow,  line  labeling,  semantic  net  matching,  etc. 

There  are  two  classes  of  relaxation  processes,  the  discrete  and  the 
continous  (or  probabal i Stic) .  The  discrete  simply  associates  a  set  (i.e. 
0/1)  of  possible  labels,  or  names,  to  an  image  part.  The  continous  case 
involves  assigning  a  likelihood  (probability)  to  this  association.  Most 
of  the  relaxation  techniques  incorporated  into  modern  vision  systems  use 
the  continous  case,  especially  when  the  image  is  complex  in  nature.  Both 
of  these  classes  utilize  two  general  models  to  generate  the  labeling 
scheme.  Tne  first  is  the  neighborhood  model  which  oetermines  which  parts 


in  the  image  directly  communicate  to  other  parts.  I  he  second  is  an  inter¬ 
action  model  which  defines  how  an  image  part  changes  its  labeling  based  on 
the  labeling  of  its  neighbors. 


Matchinc 


The  goal  of  relaxation  is  to  match  elements  in  the  image  to 


elements  oi  a  model  of  that  scene.  Elements  in  the  image  can  be  represen¬ 


tative  of  such  things  as  regions,  segments,  edges,  etc.,  that  nave  been 
produced  from  earlier  processing.  Elements  of  a  model  of  a  scene  are 
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constructed  from  knowledge  about  the  type  of  imagery  being  processed. 
Knowledge  describes  how  elements  in  an  image  are  related.  These  descrip¬ 
tions  (or  rules)  represent  the  model  of  the  elements  in  a  scene. 

Let  the  model  elements  be  denoted  by  a  unit  (Ui).  The  intercon¬ 
nection  between  these  units  may  be  used  to  give  a  fuller  description  of 
these  elements.  The  model  elements  and  their  interconnection  can  then  be 
considered  as  a  graph.  Likewise  a  graph  can  be  constructed  for  the 
elements  of  the  image  (the  names  ( Ni ) ) .  Relaxation  attempts  to  match  the 
model  graph,  M( u -j ) ,  to  the  image  graph,  I  ( n i ) . 

The  links  between  the  elements  (nodes)  indicate  the  relations 
between  the  nodes,  i.e.  above,  below,  larger  than,  smaller  than.  In  addi¬ 
tion,  for  every  node,  u i ,  there  is  a  corresoonding  probability  vector 
assigning  a  measure  of  likelihood  to  the  relation, 

•-  $  P,  (-O  ,  ?.  (33) 

where  N  is  the  number  of  names  in  the  image  graph.  The  set  of  all  vectors 
Pi  is  called  the  stochastic  labeling  of  the  set  of  units  U. 

A  formula  can  be  defined  to  rate  how  well  elememts  are  matched 
in  each  graph.  The  criteria  can  be  based  on  a  variety  of  methods,  some  of 
the  standard  approaches  use  a  simple  weighted  difference  measure.  The 
range  of  the  match  formula  is  £o,l"}  so  that  a  perfect  match  assumes  a 
value  of  unity  and  no  match  obtains  a  value  of  zero.  Let  us  denote  this 
matching  formula1^  as, 
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with  c  being  a  constant  equal  to  unity, 
measure  summed  over  all  features,  defined 
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and  D(u,n)  being  the  difference 
as, 
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for  f  features,  w|<  is  the  weighting,  S|<  is  the  assigned  strength,  and  V  is 
the  k^h  feature  value. 
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The  matching  formula  is  used  for  two  distinct  purposes.  First, 
it  is  used  to  obtain  the  initial  probabilities,  (i.e.  the  initial  likeli¬ 
hoods  of  particular  assignments), 


->  (o) 
Pi 


(36) 


Second,  the  matching  formula  is  used  in  an  adaptive  mode  to  measure  the 
quality  of  a  match.  That  is,  the  matching  formula  is  needed  to  adjust  for 
the  compatabi 1 ity  of  the  matching  elements.  Since  the  interaction  between 
elements  ui  and  nj  is  based  on  their  connection  (relation)  to  other 
elements,  then  the  compatabi 1 ity  function  will  be  based  on  these  relations 
(the  links  between  the  node  elements). 

The  compatabi 1 i ty  function  between  every  unit  ui  with  each  name 
n|<  can  generally  be  given  as, 


-7 

Pi 


(37) 


Actually  qi  can  be  thought  as  representing  what  the  neighbors  of  unit  ui 
"think"  about  how  it  should  be  labeled,  whereas,  ]>,  represents  what  unit 
ui  itself  "thinks"  about  its  own  labeling.  The  difference  between  qi  and 
Pi  represents  an  inconsistency  and  ambiguity  that  can  be  defined  as  the 
dot  product  between  these  two  vectors. 


Pi  •  9 i 


(38) 


The  goal,  then,  is  to  minimize  this  inconsistency  function 
\ 1  •  e . ,  Di  =  Qi  and  Oi  =  unit  vecto^.  A  global  measure  can  then  be 


defined  over  the  whole  set  of  units  by  averaging  the  local  measures.  By 
using  the  arithmetic  average, 
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Finally  an  updating  procedure  is  needed  that  will  provide  the 
best  estimate  of  p^.  In  order  to  achieve  this  locally,  the  global  incon¬ 
sistency  function  can  be  minimized  by  attaching  to  every  unit  ui  a  local 
gradient  vector, 


(40) 


where  the  coordinates  of  gi  are  functions  of  the  vectors  attached  to  the 
units  uj  in  the  set  M( u i ) .  The  iterative  scheme,  used  in  numerical  analy¬ 
sis,  to  minimize  the  global  inconsistency  function  can  be  described  as. 


Pi 


(n+l) 


(41) 


where  f  n  is  a  positive  scale  factor,  and  H(r')  is  the  negative  of  the 
gradient  given  in  Equation  40. 

The  procedure  to  implement  the  relaxation  algorithm  is  shown  in 
Figure  61  .  The  first  step  is  to  choose  elements  that  are  to  be 
matched.  Next  determine  the  initial  probabilities  using  all  available 
features  and  relations  with  assigned  model  units  using  the  matching 
function.  Then  begin  relaxation  to  obtain  the  "best"  p-j,  comparing  it  to 
a  threshold,  t.  If  p-j  ^  t  then  make  assignment  (match)  otherwise  relax 
again . 

Other  researchers  have  used  relaxation  techniques, 17,49  success¬ 
fully.  The  major  differences  between  them  are:  (1)  they  utilize  the  same 
compatabi 1 i ty  function  but  a  different  updating  function;  (2)  some  define 
the  compatabi 1 i ty  function  differently  while  using  the  same  updating  func¬ 
tion;  and  (3)  some  do  not  optimize  d,  so  me  updating  function,  while 
being  similar,  is  not  as  "smart." 

2.  Relaxation  and  Parallelism 

Relaxation  is  a  labe’lrg  process  which  requires  alot  of  computa¬ 
tion  since  tr.e  laoeling  needs  to  be  performed  on  a  large  amount  of  data. 
Similar  to  she  way  biological  vision  systems  operate,  a  promising  approach 
is  to  make  the  processes  parallel.  This  means  that  each  part  of  the  image 
is  to  be  analyzed  and  labeled  independently  of  the  others.  One  obvious 
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Figure  61.  Symbolic  Matching  by  Relaxation 


THE  BDM  CORPORATION 


I 

I 


& 


g 


$ 


H 

*  * 


V, 


n 


's. 


shortcoming  of  this  approach  is  that  the  full  contextural  information  of 
the  image  is  not  fully  utilized,  which  can  lead  to  propogating  errors  in 
the  image  understanding  process. 

A  solution  to  this  problem,  which  relaxation  addresses,  is  to 
assess  the  labelling  probabilities  for  every  part  independently  and  then 
compare  each  parts'  assestment  to  those  of  the  other  related  parts,  in 
order  to  detect  and  correct  potential  inconsistencies.  Since  both  the 
assesment  and  the  comparison  can  be  done  independently,  we  are  still  able 
to  perform  parallel  operations.  To  maintain  computional  cost  as  low  as 
possible,  the  comparisons  should  be  local,  i.e.  among  neighboring  pixels, 
while  maintaining  information  flow  by  iterating  the  comparison  process. 
As  previously  mentioned,  relaxation  can  be  performed  in  parallel  by  using 
Equation  41. 

Researchers^  are  currently  developing  an  architecture  to  per¬ 
form  this  higher  level  vision  processing  algorithm.  It  is  called  the 
semantic  network  array  processor  (SNAP).  They  are  currently  trying  to 
implement  discrete  relaxation. 

The  SNAP  architects®  consists  of  an  array  of  identical  cells 
each  containing  a  content  addressable  memory,  microprogram  control,  and  a 
communication  unit.  Figure  62  shows  the  SNAP  architecture.  SNAP's 
functionality  is  a  product  of  two  character i sties :  associative  processing 
and  cellular  array  processing.  Each  cell  contains  memory  control  logic 
and  communication  logic.  The  cellular  array  is  operated  by  the  array 
controller,  whicn  in  addition  provides  an  interface  between  SNAP  and  a 
host  computer.  The  cells  can  be  microprogrammed  so  they  can  operate 
independently.  The  cells  coronun icate  via  local  buses.  A  cell's  address 
is  specified  by  its  row  number  followed  by  its  column  number.  Information 
in  a  particular  cell  can  be  retrieved  either  by  its  content  (as  in 
associative  memories)  or  by  the  cell's  address. 
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Scene  classification  and  interpretation  are  key  processes  in  image 
understanding.  For  two  dimensional  scenes  this  is  generally  regarded  as  a 
matching  process  between  features  derived  from  the  image  and  2  D  models. 
To  be  more  specific,  it  is  not  a  simple  match,  but  rather  it  is  a  process 
of  verifying  that  the  model  can  give  rise  to  a  particular  representation. 
Many  of  the  computational  techniques  that  have  been  described  to  accom¬ 
plish  this  task  have  given  excellent  performance. 

However,  for  more  complex  scenes,  classification  and  interpretation 
is  much  more  difficult.  Information  regarding  intrinsic  characteristics 
of  the  scene,  as  described  by  the  2  1/2  D  sketch,  provide  valuable  infor¬ 
mation  to  the  image  understanding  process.  Robot  vision  is  an  example, 
where  the  scene  to  be  described  is  fundamentally  three  dimensional, 
involving  substantial  surface  relief  and  object  occulsion.  As  mentioned 
earlier,  even  two  dimensional  scenes  that  are  complex  in  nature  are 
problematic  in  terms  of  classification  and  interpretation. 

The  objective  of  high  level  processing  for  image  understanding  is  to 
operate  on  the  intermediate  representations  ( i . e . »  the  full  primal  sketch, 
2  1/2  0  sketch)  derived  from  earlier  vision  processes  fur  scene  recogni¬ 
tion,  classification,  and  interpretation.  The  operations  performed  in 
high  level  processing  attempt  to  group  earlier  processes  into  a  form  that 
is  amenable  to  symbolic  processing.  Once  the  scene  is  restructured  into 
symbolic  form,  the  final  step  is  to  give  the  system  "knowledge."  The 
approach  taken  has  been  to  "program"  into  the  machine  vision  system  a 
reasoning  capability  using  artificial  intelligence  techniques. 

Symbolic  “epresentations  of  image  elements  provides  the  easiest  way 
for  a  computer  to  understand  a  complex  scene.  Symbols  supply  information 
such  as  form,  structure,  and  groupings  whereas  processing  over  synods 
supplies  information  such  as  relations,  occurrences,  and  matching. 
Symbolic  processing  over  image  elements  allows  the  vision  system  to  make 
inferences  about  a  scene,  similar  to  the  way  a  human  would. 
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The  human  visual  system  is  able  to  accept  information  from  a  multiple 
of  sources,  fuse  the  information  into  a  pool  of  knowledge,  reason  across 
the  knowledge,  predict  which  of  the  sources  will  provide  the  most  benefi¬ 
cial  amount  of  new  information,  and  direct  resources  to  those  sc.vces  to 
process  new  information.  Often  this  information  is  the  resu\  of  partial 
scanning,  resource-limited  processing,  heuristic  beliefs,  and  probabilis¬ 
tic  assumptions.  Thus,  human  processing  uses  information  which  is  incom¬ 
plete,  uncertain,  even  incorrect.  Each  piece  of  information  thus  has  some 
amount  of  belief  associated  with  it.  In  other  words,  the  belief  is  based 
on  evidence.  Making  an  inference  about  the  world  based  on  beliefs 
requires  not  only  reasoning  over  the  information  but  reasoning  about  the 
belief  of  the  information  and  the  evidence  that  the  belief  is  based  on. 

The  goal  of  an  intelligent  machine  vision  system  is  to  manage  the 
information  that  it  gathers,  similar  to  the  way  a  human  would.  The  vision 
system  must  be  able  to  fuse,  evidentially  reason  over,  and  control  the 
processing  of  a  variety  of  information  that  may  be  uncertain,  incomplete, 
and  even  incorrect. 

The  two  elements  that  are  essential  for  high  level  processing  are: 
(1)  knowledge  representation,  and  (2)  association/conversion  between 
knowledge  representations  and  scene  domains.  Both  of  these  processes 
involve  some  form  of  contextural  processing.  The  purpose  of  knowledge 
representation  is  to  store  facts,  relationships,  and  strategies  for  deriv¬ 
ing  them.  A  variety  of  knowledge  representation  schemes  exist.  Some  of 
them  include:  -a)  semantic  networks;  (b)  object-attribute-value  triplets; 
(c)  rules;  (d)  frames;  (e)  logical  expressions;  (f)  2-D  and  3-0  structural 
representations ;  and  (g)  iconic.  For  example,  a  semantic  network  repre¬ 
sents  objects  and  relations  between  objects  as  a  grapn  structure  of  nodes 
and  labeled  arcs.  The  arcs  usually  represent  relations  between  nodes. 
Semantic  nets  are  especially  attractive  as  analogical  representations  of 
spatial  states  of  affairs. 

The  association/conversion  between  knowledge  reprsentat ions  and  scene 
domains  is  basically  a  matching  process.  The  matching  is  conducted 
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between  a  model  of  the  scene,  derived  from  prior  knowledge  about  the  type 
of  imagery,  and  the  represenations  derived  from  earlier  vision  processes. 

A  conceptual  functional  architecture  for  high  level  processing  for 
image  understanding  is  given  in  Figure  63  .  The  architecture  consists  of 
the  following  five  modules:  (1)  feature  organizer;  (2)  predictor;  (3) 
planner;  (4)  inferencer;  and  (5)  matcher.  Each  of  these  elements  will  be 
discussed  in  the  following  paragraphs. 

The  feature  organizer  derives  relevant  groupings  and  structures  from 
an  image  without  prior  knowledge  of  its  contents.  The  predominant  proces¬ 
ses  that  are  used  to  perform  these  tasks  are  segmentation  and  feature 
extraction.  The  type  of  process  depends  on  a  number  of  factors,  including 
data  type,  computational  complexity,  etc.  The  feature  organizer  performs 
the  following  functions.  The  first  is  that  it  partitions  an  image  into 
sets  of  related  features,  which  reduces  the  search  space  required  for 
model  selection  and  matching.  Second,  the  relations  formed  by  these 
processes  can  serve  as  reliable  indicies  to  access  both  model  and  world 
knowledge  bases. 

The  predictor  applies  expectations  concerning  what  objects  are 
expected  to  be  present  in  an  image  and  the  way  in  which  they  might  mani¬ 
fest  themselves.  The  predictor  aids  in  the  matching  process. 

The  planner  generates  a  sequence  of  actions  intended  to  facilitate 
the  image  understanding  process.  Planning  '$  the  nrocess  of  ordering  the 
operations  such  that  their  actions  and  resource  needs  do  not  conflict. 
The  planner  aids  in  making  decisions  as  to  what  sources  of  information 
should  be  concentrated  on,  what  processes  to  run,  where  and  when  to  run 
them,  what  parameters  to  use,  and  which  data  to  pass  along. 

Several  planning  strategies  are  applicable  to  high  level  processing 
for  image  understanding.  There  are  hierarchial  planning  strategies,  which 
involves  a  detailed  description  of  the  planning  steps  as  well  as  a  high- 
level  conceptual  specification  so  that  there  is  a  hierarchy  of  repre¬ 
sentations  of  a  plan.  The  hierarchial  planning  procedure  operates  from 
the  abstract  to  more  specific  so  that  the  plan  development  is  not 
initially  computationally  overwhelming.  Given  a  high  level  goal  with 
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constraints,  the  goal  is  expanded  into  a  partially  ordered  net  of  subgoals 
and  actions  which  will  achieve  the  original  goal.  Then  the  subgoals  are 
examined  and  compared  to  the  internal  world  model  to  see  if  they  .’re  true. 
If  they  are  not,  the  net  is  examined  to  determine  if  reconstructing  the 
plan  will  make  the  subgoal  true,  and  if  not,  the  subgoal  is  made  true  by 
expanding  it  into  more  subgoals.  This  process  continues  until  all  goals 
are  true.  Intermitently  the  plan-net  is  examined  for  conflicts  between 
goals  (i.e.  attempting  to  use  the  same  resources  at  the  same  time). 
Often  conflicts  are  resolved  by  reordering  the  plan-net.  Finally,  goals 
are  kept  parallel  until  a  necessary  ordering  can  be  determined,  and  vari¬ 
ables  are  not  arbitrarily  bound  but  have  constraints  placed  on  them  until 
a  correct  value  is  found. 

There  are  non-hierarchial  planning  strategies  which  generates  only 
one  representation  of  a  plan.  This  technique  does  not  distinguish  between 
problem  solving  actions  and  other  actions,  and  may  employ  goal  reduction 
to  simplify  the  problem.  The  non-hierarchi al  system  may  also  use  means- 
ends  analysis  to  reduce  tne  differences  between  the  current  state  of  the 
world  and  the  result  of  executing  the  plan.  Non-hierarchial  planning 
strategies  are  usually  implemented  on  simpler  vision  systems  or  systems 
which  can  be  strictly  defined. 

The  matcher  forms  relations  between  internal  representations  of 
visual  information  and  world  knowledge  for  the  purpose  of  scene  recogni¬ 
tion  and  interpretation .  'Matching  can  occur  between  numerous  representa¬ 
tions  of  an  image  including  iconic,  geometric,  and  relational  structures. 
Matching  at  the  iconic  level  is  carried  out  by  template  matching,  as  dis¬ 
cussed  in  a  previous  section.  Matching  at  the  geometric  level  involves 
creating  a  correspondence  between  cat  a  and  a  par amet i zee  model.  Terns 
matching  is  a  process  which  determines  which  parameters  of  the  model  are 
best  utilized  for  a  meaningful  representation  of  the  data.  An  example  of 
this  type  of  matching  is  the  feature  extraction  process  used  m  statisti¬ 
cal  pattern  recognition.  -  third  form  of  matching  involves  relational 
structures.  Since  relational  structures  are  often  represented  as  graphs 
or  networks,  the  matching  problem  can  become  one  of  graph  or  subgraph  iso- 
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morphism.  Some  techniques  include  enumerative  sear,;.  of  the  tree  of 
possible  matches  between  nodes,  and  parallel-iterative  refinement  applied 
during  search.  An  examp^  of  this  type  of  matching  is  relaxation.  Also, 
generalized  structure  matching  may  also  be  used  where  isomorphism  is  not 
possible  due  tn  missing  or  uncertain  information.  An  example  of  this  type 
of  matching  process  was  demonstrated  in  the  syntactic  pattern  recognition 
process .' 

Control  mechanisms  are  incorporated  to  ensure  that  the  inference 
process  proceeds  in  a  logical  fashion.  Several  techniques  for  control  are 
relevant  to  high  level  processing.  They  are:  (1)  data  driven  vs.  goal- 
driven;  (2)  hierarchial  vs.  heterarchical;  and  (3)  parallel  vs.  serial. 

Finally,  the  inference  module  is  used  to  deduce  the  presence  or 
absence  of  features  wrthin  an  image  and  for  the  ultimate  recognition  and 
interpretation  of  objects  and  scenes.  A  number  of  inference  mechanisms 
are  applicable  to  high  level  processing  for  image  understanding.  Predi¬ 
cate  logic  is  a  system  for  expressing  propositions  and  deriving 
consequences  of  facts.  Production  systems  are  "  if  ...  then  ..."  or 
situation-action  pairs  which  are  more  flexible  than  first  order  predicate 
logic.  Labeling  schemes  are  another  form  of  the  inference  process  that  is 
used  to  assign  labels  to  images  in  a  probability-like  fashion.  We  have 
seen  labeling  schemes  when  we  discussed  relaxation.  Finally,  active 
knowledge  used  procedures  as  the  elementary  units  of  knowledge. 

The  five  modules  that  have  been  described  represent  the  functional 
approach  that  is  being  used  for  high  level  processing.  Whereas  in  low 
level  processing  the  source  of  data  was  scene  primitives  and  features  and 
in  tne  intermediate  level  the  processes  involved  grouping  these  primitives 
and  features  into  symbolic  form,  mgn  level  processing  involves  operating 
over  these  symbolic  representations.  Generally,  these  symbolic  represen¬ 
tations  are  in  the  form  of  a  semantic  expression.  Consequently,  th.  type 
of  operations  performed  involve  ccntextural  processing.  Tne  languages 
that  are  being  used  to  perform  this  type  of  processing  are  LISP  and 
PROLOG.  These  languages  are  well  suited  for  symbolic  data  processing. 
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One  of  the  important  considerations  for  researchers  working  on  proto¬ 
type  vision  systems  has  been  the  design  of  a  robust  matching  module.  LISP 
has  been  used  successfully  as  a  pattern  matcher.  Pattern  matching  is  the 
process  of  comparing  symbolic  expressions  to  see  if  one  is  similar  to 
another.  Although  LISP  has  no  pattern  matching  built  in,  LISP  makes  it  is 
easy  to  write  pattern  matching  procedures.  The  general  strategy  to  accom¬ 
plish  this  is  to  match  a  pattern  to  a  model  (datum)  using  some  of  the 
symbol -manipulating  functions  and  matching  procedures. 

There  are  other  facilities  of  LISP  that  make  it  attractive  in  design¬ 
ing  the  reasoning  modules  used  in  high  level  processing.  Production 
systems  or  "if . .then. rules  can  be  efficiently  constructed  in  a  LISP 
environment  to  provide  an  inferencing  capability.  Production  systems  are 
popular  because  they  offer  modularity  and  incremental  growth  characteris¬ 
tics. 

A  production  system  has  three  general  components:  (1)  a  data  base; 
(2)  a  set  of  rules;  and  (3)  an  interpreter  for  the  rules.  Of  particular 
importance  is  the  matching  of  rules  or  patterns  to  the  data  base.  Proce¬ 
dures  in  LISP  for  pattern-directed  matching  are  called  demons.  Demons  are 
procedures  that  are  activated  automatically  when  a  value  is  placed  or 
removed  and/or  modified  from  a  database.  Demons  constructs  have  been  an 
important  development  in  designing  a  robust  vision  system. 
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It  would  be  very  instructive  to  provide  an  example  of  the  steps  used 
in  an  image  understanding  process.  This  example  is  not  meant  to  represent 
the  most  efficient  processes  nor  the  approach  that  would  be  taken  in  a 
real-time  application  but  rather  is  intended  to  be  pedantic. 

Figure  64  (a)  is  an  infrared  image  of  a  tank.  Our  purpose  is  to 
separate  the  tank  from  the  background  and  to  classify  which  type  of  tank 
it  is.  We  can  apply  a  segmentation  algorithm  that  is  sensitive  to  a  part¬ 
icular  characteristic  of  this  type  of  imagery.  The  result  of  the  segmen¬ 
tation  algorithm  would  be  a  blob  representation  whose  shape  would  grossly 
resemble  the  shape  of  the  tank.  However,  information  is  lost  as  to  the 
particular  arrangement  of  parts  of  the  tank,  for  example,  the  wheels, 
tread  cover,  turret.  This  is  a  common  problem  in  an  image  understanding 
system. 

In  order  to  produce  a  reliable  segmentation  procedure  it  would  be 
beneficial  to  use  cues  about  the  way  image  features  are  structured  in  the 
world.  These  cues  are  in  the  form  of  knowledge  rules  which  reflect  the 
way  in  which  humans  group  objects.  These  rules  are  based  on  Gestalt 
psychology.  For  example,  characteristics  such  as  proximity,  similarity, 
col  1  inear ity,  and  containment  can  be  used  as  reasoning  cues  to  aid  in  the 
image  understanding  process.  This  will  enable  easier  classification  and 
i nterpretat ion . 

Figure  64  (b)  is  a  primal  sketch  representation  of  the  tank  which  is 
used  to  show  the  primitive  characteristics  of  the  image.  As  can  be  read- 
i  1  y  observed,  additional  processing  is  needeo  to  interpret  this  scene. 
Figure  64(c)  is  the  result  of  further  processing  where  the  tank  is  seg¬ 
mented  from  the  background  but  there  is  confusion  as  to  the  arrangement  of 
tank  regions.  For  example,  tne  segmentation  algorithm  produced  three 
separate  regions  for  the  tread  cover.  Obviously,  there  is  a  need  to 
cluster  these  regions  into  a  homogeneous  grouping.  Once  this  is  accom¬ 
plished  it  will  be  easier  to  proceed  to  the  next  step  in  the  image  under¬ 
standing  process. 
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Grouping  the  regions  into  a  symbolic  form  is  considered  an  intermedi¬ 
ate  level  processing  step.  This  representation  will  make  it  easier  to 
perform  high  level  processing  for  scene  understanding.  Figure  64  (d) 
shows  an  arrangement  of  regions  that  have  been  arbitrarily  assigned,  which 
has  resulted  from  the  segmentation  algorithm.  Applying  the  grouping  rules 
mentioned  earlier  in  a  well  defined  order  produces  the  graphic  in  Figures 
64  (e)  and  64  (f).  Figure  64  (g)  is  the  result  of  the  final  clustering 
procedure  where  the  tank  is  segmented  into  three  separate  groupings 
relating  to  the  three  distinct  regions  of  a  tank  -  the  wheels,  tread 
cover,  and  turret. 

Figure  64  (h)  is  a  tree-like  structure  which  represents  the  steps 
taken  in  region  clustering.  This  symbolic  representation  is  in  a  form  of 
a  semantic  net  which  aids  in  high  level  processing.  The  primal  node 
groups  all  the  clusters  of  nodes  into  a  single  node  which  represents  the 
entire  image.  The  other  nodes  are  clustered  according  to  the  rules 
mentioned  earlier.  Each  node  is  a  symbolic  representation  of  a  region  and 
its  attributes.  Records  are  kept  as  to  the  relation  of  one  node  to  the 
next.  Evidence  is  accrued  relating  to  these  records  for  merging  purposes. 
Finally,  matching  this  tree-like  structure  to  a  model  results  in  classifi¬ 
cation  of  the  tank. 

Figure  65  summarizes  the  steps  for  scene  understanding  in  the  tank 
example.  Of  particular  importance  is  where  knowledge  is  incorporated  into 
the  processing  steps.  For  this  example,  which  can  be  indicative  of  other 
cases,  knowledge  has  been  implemented  at  segmentation.  In  some  cases  it 
should  be  implemented  as  early  as  preprocessing.  In  addition,  since  the 
image  understanding  process  is  goal-driven  as  well  as  data-driven,  knowl¬ 
edge  that  is  incorporated  into  semantic  net  for  hign  level  processing  can 
also  be  returned  to  other  lower  level  processing  for  planning  purposes. 
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CHAPTER  VII 

CONCLUSIONS  AND  DIRECTIONS  FOR  FURTHER  RESEARCH 

This  report  investigated  the  various  processes  used  in  image  under¬ 
standing  by  computer  as  well  as  the  implementation  of  these  processes  in  a 
parallel  environment.  Table  1  provides  a  general  outline  of  the  major 
topics  that  have  been  addressed. 

The  areas  in  which  image  understanding  (IU)  systems  are  finding 
application  are  growing.  For  example,  expert  systems  are  being  designed 
to  aid  a  trained  individual  in  analyzing  different  types  of  imagery  from 
remote  sensors,  for  both  commercial  and  military  functions.  Another 
popular  example  is  in  the  area  of  robotic  devices  which  are  equipped  with 
visual  capabilities.  Finally,  the  military  is  interested  in  incorporating 
image  under-standing  methodologies  into  their  imaging  systems  to  create 
more  effective  ar.d  reliable  systems. 

Implementation  of  image  understanding  algorithms  is  an  important  con¬ 
sideration.  Many  of  tne  algorithms  which  are  well-suited  for  parallelism 
have  been  cited.  In  general,  the  algorithms  which  perform  "low-level" 
operations  are  those  which  can  be  mapped  onto  a  parallel  processing  archi¬ 
tecture.  This  results  from  the  fact  that  most  of  these  processes  operate 
on  a  local  ne’gnborhood  of  pixels.  Obviously,  many  of  the  promising 
parallel  architectures  being  designed  today,  or  planned  in  the  future,  can 
have  a  Jirect  impact  in  fostering  the  development  of  image  understanding 
algorithms  for  computer  vision. 

The  benefit  of  us.nq  an  optical  processing  environment  has  also  bee-, 
expounded.  Again,  many  of  the  applications  center  around  using  optics  for 
"low- level"  operations.  Obviously,  tne  only  practical  benefit  of  using 
optics  for  image  understanding  systems  is  to  implement  those  functions 
that  optical  processing  performs  so  well,  i.e.  correlations  and  Fourier 
transform  operations.  At  this  time,  the  general  consensus  is  that  an 
optical  correlator  is  a  very  powerful  processor,  and  as  such,  it  will  be 
difficult  for  new,  image  processing  parallel  architectures  to  surpass  it. 
The  emphasis,  then,  should  be  in  using  optics  where  optics  has  a  clear 
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advantage.  This  can  be  realized  by  building  hybrid  digital/optical 
systems,  where  front-end  processing  is  performed  by  an  optical  system  and 
intermediate  to  high  level  processing  is  performed  by  an  intelligent  back¬ 
end  d^ital  preceding  environment.  It  is  important  to  keep  in  mind  that 
even  though  many  researchers  have  analyzed  the  true  effectiveness  of  opti¬ 
cal  processing,  in  terms  of  accuracy  and  repeatibi 1 ity,  a  proper  matching 
of  algorithms  to  hardware  is  still  needed. 

Implementation  of  the  intermediate  to  high  level  processes  for  image 
understanding  in  a  parallel  environment  becomes  more  difficult.  The 
parallel  architectures  used  to  implement  some  of  the  major  algorithms 
commonly  used  for  intermediate  level  processing  have  been  described.  In 
general,  as  the  processing  gets  more  symbolic  and  knowledge  oriented  and 
less  numeric  the  further  the  possibility  of  going  toward  a  parallel  imple¬ 
mentation.  This  is  especially  true  for  high  level  processing. 

Most  of  the  research  efforts  in  image  understanding  has  been  geared 
towards  three  general  areas:  (1)  developing  processes  which  create  rich 
descriptions  of  a  scene  for  use  in  higher  level  processing;  (2)  extracting 
intrinsic  characteristics  from  an  image;  and  (3)  improving  the  techniques 
used  in  higher  level  symbolic  processing.  Higher  level  procssing  can  be 
much  more  efficient  if  lower  level  processes  produce  rich  representations 
of  a  scene.  This  has  prompted  research  into  developing  processes  which 
can  extract  more  information  as  well  as  symbolic  representations  from  a 
scene  as  early  in  the  processing  stages  as  possible.  Many  of  these 
requirements  imply  improving  segmentation  techniques.  Creating  rich 
representations  of  a  scene  during  early  processing  is  also  a  result  of 
top-down  control,  i.e.  where  low  level  processes,  such  as,  image  enhance¬ 
ment,  is  monitored  by  high  level  requirements. 

Extracting  intrinsic  character i sties  from  an  image,  such  as  depth  and 
orientation,  are  key  processes  in  image  understanding.  Some  of  the 
approaches  taken  were  discussed  in  Chapter  IV.  These  characteristics 
supply  valuable  information  about  the  visual  scene.  It  also  can  be  jsed 
to  build  three-dimensional  geometries  of  visual  surfaces  to  further  aid  in 
scene  understanding. 
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Finally,  improvements  in  higher  level  processing  can  be  examined  in 
three  areas.  First,  proper  character ization  of  the  knowledge  base  is 
essential  to  an  image  understanding  system.  An  image  understanding  system 
has  to  be  designed  with  an  accurate  knowledge  base.  This  entails  a 
thorough  understanding  of  the  problem  domain  and  the  parameters  for  its 
solution.  Second,  where  knowledge  is  incorporated  into  the  image  under¬ 
standing  process  is  a  crucial  consideration.  Generally,  inputting  knowl¬ 
edge  functions  in  as  many  places  as  possible  is  attractive.  However, 
there  is  a  limit  to  the  overall  benefit  of  such  an  approach  in  terms  of 
efficiency,  effectiveness,  and  cost.  Third,  maintaining  a  modular  system 
configuration  is  beneficial  in  that  it  allows  flexibility  in  building 
large  and  more  reliable  systems.  Modular  design  is  a  system  concept  that 
can  be  implemented  in  the  form  of  rules  particular  to  the  high  level 
processing  function.  Modularity  is  also  attractive  in  that  it  promotes 
creative  system  design. 

A.  DIRECTIONS  FOR  FURTHER  RESEARCH 

Frustrations  with  initial  attempts  at  automated  image  understanding 
made  it  clear  that  vision  is  so  complex  a  computational  task  that  it  is 
unreasonable,  either  logically  or  pragmatically,  to  conceive  it  as  occur- 
ing  in  a  single  processing  step.  As  we  have  shown,  visual  processing  is 
best  thought  of  as  being  distributed  ove  a  series  of  computational 
stages.  7-iis  concept  was  eloquently  expressed  by  Marr.l  However,  one  of 
the  major  problems  in  applying  this  methodology  to  computer  vision  has 
been  the  lack  of  a  computational  approach  that  unifies  all  the  processing 
stages.  *'e  can  refer  to  tnis  as  a  unifying  computational  tnecry  or 
vision.  This  has  been  one  of  the  major  reasons  why  machine  vision  has 
been  such  a  difficult  problem  to  solve.  Many  of  the  approaches  that  have 
been  used  n  tne  past  simply  combine  a  sequence  of  processing  techniques 
that  seemed  appropriate  for  a  specific  application.  however,  as  tne 
scenes  to  be  analyzed  become  increasingly  complex,  this  approach  quickly 
breaks  dcwn.  Researchers  have  come  to  the  realization  that  a  much  higher 
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level  of  understanding  is  needed  to  construct  this  computational  frame¬ 
work. 

In  order  to  develop  a  complete  understanding  of  the  visual  processes 
and  representations  that  should  be  used  in  creating  a  unifying  computa¬ 
tional  theory  of  vision  these  elements  are  needed:  (a)  a  computational 
theory;  (b)  the  representation  and  the  algorithm;  and  (c)  the  mechanism. 
Marr  and  his  predecessors  have  already  expounded  on  the  mechanistic  guide¬ 
lines.  The  mechanistic  theories  are  based  on  biological  visual  processes 
which  are  used  to  constrain  possible  theories  and  guide  the  development  of 
algorithms.  The  computational  theory  is  an  attempt  to  create  a  general 
theoretical  framework  which  engrosses  all  levels  of  processing  in  order  to 
build  rich  surface  representations  for  scene  understanding.  Finally,  the 
representation  and  the  algorithm  are  the  result  of  transforming  a  theoret¬ 
ical  idea  into  an  engineering  concern,  i.e.,  building  it. 

A  processes  that  attempts  to  provide  a  unifying  computational  theory 
is  relaxation.  Relaxation  bridges  the  gap  of  all  processing  stages  since 
it  can  be  implemented  in  all  three  levels  of  processing.  For  example,  it 
has  been  used  for  edge  detection,  edge  linking,  optical  flow,  line  label¬ 
ing,  shape  from  shading,  semantic  net  matching,  etc.  Relaxation  also  maps 
nicely  into  being  a  local  and  highly  parallel  process.  This  makes  it  very 
attractive  to  parallel  implementation,  as  was  discussed.  Finally,  neuro¬ 
scientists  have  conducted  numerous  experiments  which  suggest  that  visual 
mechanisms  also  implement  local  and  highly  parallel  processes.  Although 
relaxation  seems  very  promising  it  is  not  the  solution  in  itself. 
Additional  research  is  needed  to  discover  a  computational  theory  which 
reconstructs  from  image  primitives  a  ''complete"  representation  of  a  scene. 
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AH  OECAHIZATIOHAI.  FRAMEVOBX  tOi  CCMFA11HC  ADAPTIVE  A1TIFICIAL  IHTE1XICEHCE  SYSTEMS 


Teres*  A.  Blaxton  and  Brian  C.  Kuahner 


One  of  the  moat  tncercatlng  toplct  of 
invest  1  gat  Ion  In  the  field  of  AI  la  aachlne 
learning  where  ayatema  arc  being  Mde  to 
automnt  ically  "adapt"  or  learn  over  time.  The 
following  paper  preaenta  a  common  framework  for 
organizing  acveral  dlverae  adaptive  AI  ayatema. 
Nine  ayatema  are  diacuaaed  with  regard  toi  (a) 
the  typea  of  knowledge  repreaentatlona  they  employi 
(b)  storage}  (c)  retrieval  mechanlama)  (d)  conflict 
resolution  prlnclpleo}  and  (e)  neana  of  adapting 
knowledge  and  control  acructurea  aa  a  function 
of  changing  experience. 
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oplca  of  organizational  atructure  will  be  suggested  for 

}  machine  this  body  of  research,  hopefully  lending  aoae 

made  to  Insights  into  directions  for  future  development. 


There  la  a  new  trend  evolving  in  artificial 
Intelligence  which  directly  imparts  the  work 
being  done  on  expert  ayatema.  More  and  more, 
people  are  becoming  unaatiafled  with  constructing 
static  knowledge  bates  which  are  obsolete  almost 
as  soon  aa  they  are  completed  and  mutt  be  updated 
on  a  continual  baala.  The  alternative  being 
explored  la  that  of  creating  ayatema  which  adapt 
with  the  addition  of  new  knowledge,  not  only 
learning  new  facte  but  actually  changing  their 
own  memory  organizer  Iona  and  control  structure* 
to  accommodate  the  new  data  mors  efficiently.1'2'^ 
Sue  1}  systems  ere  potentially  more  uaer  oriented, 
more  flexible  to  operate,  and  are  leat  cottly 
from  a  tof tware/ 1 1  fe*cycl*  support  viewpoint. 


Thla  organizational  framework  haa  several 
parts,  all  focusing  on  issue*  relating  to  th* 

incorporation  and  utllizatloi.  of  knowledge  U 
ASa.  In  term*  of  distinction*  between  various 
ASa,  th*  type  of  knowledge  acquired  by  these 
ayatema,  aa  well  aa  how  thi*  knowledge  Is 
represented,  are  th*  aspect*  moat  aimllar  to 
those  of  traditional  expert  ayatema.  Other 

familiar  elements  of  thla  framework  Include  the 
retrieval  operations  used  by  the  AS  to  access 

previously  stored  information,  and  the  mechanises 
used  to  control  the  processing  In  the  ayatee 
But  that  1*  where  th*  similarity  end*.  atnee 
both  the  mcchsnlam*  by  which  Incoming  knnwledse 

1*  automatically  stored  in  the  system,  and  th» 
•  trategle#  for  modifying  or  adapting  the  knowledje 
base  and  control  atructure,  are  by  necesait' 

unique  feature*  of  theae  adaptive  ayatema.  These 
criteria  will  be  discussed  In  greater  detail 
In  aubaequent  aectlona  of  thla  paper,  where  vs 
will  In  turn  (a)  eatabllah  the  need  for  bulldlrr 
ASa;  (b)  outline  a  framework  of  the  concerns 
on*  night  face  when  trying  to  build  an  AS;  (c) 
compare  and  contrast  several  already  exlatlrt 
AS*;  and  (d)  lay  out  some  guiding  principles 

for  building  an  "ideal"  AS. 


Problem*  With  Traditional  Expert  Syatema 


Although  aeveral  reaearcher*  have  attampted 
to  build  adaptive  eyetemt  (ASa),  they  have 
un  (or  t  unn  t  r  1  y  done  so  without  the  benefit  of 

any  guiding  theory.  Thoae  frameworki  that  have 
been  offered  have  not  been  sufficiently  general 

to  encompaa*  the  wide  variety  of  ayacemi  that 

Itii  vc  been  developed,4'^'*  Consequently,  Che 
literature  addresses  AS*  from  multiple  perspectives 
and  with  inconsistent  terminology,  making  it 
difficult  to  compare  the  accomplishment*  of  theae 
systems  with  one  another.  In  thla  article,  an 


Many  have  argued  that  expert  ayatema  are 
the  one  area  in  which  the  AI  enterprise  haa  been 
truly  successful .8’^  These  claims  are  baaed  upon 
the  proven  utility  of  some  computerized  e«per: 
systems  that  are  used  to  aid  human  expert*  ir 
solving  problems  within  limited  application 
domains.10'11  Despite  theae  eu cceaaes,  their 
are  atlll  many  nontrivial  problema  asaoclated 
with  building  and  maintaining  expert  systems, 
a  few  of  which  will  now  be  enumerated. 


To  begin,  the  proceae  of  knowledr- 
engineering,  whereby  the  knowledge  base  of  r 
expert  system  la  acquired.  I*  fraught  vi:v 
difficulties.12  At  worst,  the  availability  of 
the  main  expert  or  experc#  may  be  limited  ex 
inadequate  during  rhe  knowledge  baae  conatruc  t  tc-> 
period.  Barring  thla  eventuality,  the  experts 
th«t  are  available  may  disagree  a*  to  wh,: 
knowledge  to  Include,  leaving  the  knowled^- 
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•  ngtneer  at  a  loat  a*  to  how  to  reaolve  the 
dilemma.  Perhtp*  more  seriously,  r  he  knowledge 
engineer  It  faced  with  the  tttk  of  obtaining 


procedure  1 

knowledge  from 

the  expert 

who 

can 

report  only 

declarative 

knowledge . 

That 

1*. 

ihe 

expert 

can  declare  that  something 

la 

true , 

but 

cannot 

accurately  describe  how  to  do 

It. 

Aside 

from  problema 

encountered 

In 

the 

conitruccion  of  traditional  expert  systems,  a 
hotc  of  dlfficultlee  arlae  once  the  ayatea  la 
coepleted.  Foremoat  among  theae  la  the  challenge 
of  updating  the  knowledge  baae  to  keep  It  current 
while  maintaining  cruth  valud-Ln  the  ayatea. 
Thla  ti  .nee  particularly  acute  aa  the  alze  of 
the  knowledge  baae  Increaaea.  The  practice  of 
eodularlztng  the  knowledge  baae  Into  aeparate 
eectlont  and  aaslgnlng  each  to  a  different 
knowledge  engineer  haa  eaaed  thla  altuatlon 
eomewhat.11  However,  moat  would  agree  that  there 
It  atlll  room  for  loproveaent,  alnce  there  are 

•  till  problema  aaaoclated  with  aalntalnlng 
knowledge  conalatency  and  coordination  acroae 
theae  module*.  Perhapa  even  a  larger  concern 
la  that  moat  expert  eyetem  formulaclona  offer 
no  provision  for  automatically  augmenting  the 
knowledge  baae  and  control  atructure  as  the  need 

•  rlaea.  Theae  d  1  f  f  leui t  lea ,  coupled  with  the 
often  limited  domain  of  applicability  of  aoat 
easterns,  auggeat  that  lnvea  t  lga  t  Ion  of  an 
slternatlve  approach  la  warranted. 

Recent  Advances.  Adaptive  Syateae 

Aa  waa  already  mentioned,  recent  work  In 
the  erea  of  adoptive  eyateme  haa  led  to  theoretical 
(dvarcea  over  traditional  expert  ayatea 
formulatlona .  If  realized  on  a  large  acale, 
AS  would  be  preferred  to  expert  systems  for  aeveral 
rtaeoni.  Flrac  ASe  may  be  implemented  without 
•cceaa  to  an  "expert"  per  ae.  That  la,  the  ayatea 
•»y  atart  out  at  a  novice  knowledge  level,  and 
gradually  build  up  to  a  higher  level  of  expertise 
through  experience  and  through  Interaction  with 
•c=e  external  'knowledge  aourcea."  Such  a  ayatem 
vould  be  uaeful  In  ao-called  "cutting  edge 

technology"  domalne  where  the  current  knowledge 
baae  la  amall,  but  expected  to  grow. 

In  terms  of  the  user/syetea  Interface,  ASa 

aay  be  particularly  beneficial  In  the  development 
of  Individualized  svatema  where  command*  accepted 
by  the  ayetera  are  customized  for  a  small  set 
of  users.  In  addition,  one  might  Imagine  that 
an  AS  that  has  Itself  progressed  from  the  novice 
lo  expert  ataece  would  provide  more  understandable 
responses  to  queries  made  by  a  novice  uaer  than 
vould  a  traditional  expert  ayatem. 1** 

In  a  more  global  aenae,  ASs  are  preferable 
to  traditional  expert  system*  for  the  almpla 
reason  that  intelligence  is  dynsmlc,  and  In  any 
domain  of  Interest  the  knowledge  and  heurlatlc* 
utilized  are  bound  to  change  with  time.  A  major 

•  count  of  effort  haa  been  expended  In  the  past 

to  find  domains  which  are  suitably  narrow  and 
itatlc  for  building  expert  systems.  In  spite 
rf  this,  the  systems  that  have  been  developed 
Bust  still  be  updated  and  augmented  fairly 


frequently,  and  hence  require  coetly,  manpower 
Intensive,  eupport  tall*.  One  way  to  avoid  this 
pitfall  1*  to  build  systems  that  adapt.  Therefore, 
It  te  our  belief  that  the  performance  of  the 
system  will  be  Improved  to  the  degree  that  the 
system  and  Its  knowledge  baae  adapt  to  the  gradual 
evolution  of  the  domain  itself.  * 

Much  of  the  pioneering  work  on  ASa  haa  been 
conducted  by  cognitive  psychologist*  interested 
In  modeling  different  aspect*  of  human  cognition. 
These  researcher*  have  experimented  with  the 
application  of  learning  principle*  to  AI  ayatem* 
In  domains  aa  diverse  aa  number  categorization, 
puzzla  solving,  language  acquisition,  and 
manipulation  of  geometric  figure*.  Due  to  the 
disparity  of  theae  topics,  reader*  of  thla 
literature  may  eometlmea  find  It  difficult  to 
apprehend  relevant  similarities  and  differences 
among  theae  systems.  The  organizational  framework 
presented  here  la  Intended  to  bring  a  certain 
order  to  thla  chaos  by  providing  points  of 
comparison  common  to  all  of  these  eyetem*,  varied 
though  they  may  be.  For  purpose*  of  the  present 
paper,  the  framework  will  be  discussed  In  regard 
to  only  a  subset  of  ASe.  **  Before  presenting 

the  framework,  each  of  ASs  to  ba  dlscuaied  will 
first  be  briefly  described. 

^present at lvc  Croaaectlon  of  Adaptive  System* 

The  flret  AS  to  be  described  i«  called  ACT* 
and  waa  published  by  John  Anderson.1  Intended 

a*  a  comprehensive  theory  of  human  cognition, 
ACT*  model*  such  complex  activities  as  rotating 
mental  Images,  solving  geometry  proofs,  retrieval 
processes  Involved  in  reading,  and  language 
acquisition.  It  1*  by  far  the  moet  fully  developed 
ayer.em  to  bf  _  deecr Ibed  in  thl*  article.  Since 
ACT*  waa  designed  to  Dodel  human  cognition,  It 

may  embody  eome  conetrainta  that  are  unattractive 
In  the  realm  of  AI  application*.  Nevertheless 
It  la  argued  that  there  la  much  to  learn  about 

building  AS#  from  the  atudy  of  auch  a  ayatem. 


*  It  haa  been  argued  elsewhere  that  building 
ASa  1*  not  necessarily  a  wor-hwhtle  enterprise 
In  that  learning  can  be  a  long  and  lterarl/e 
process,  perhaps  requiring  more  effort  chan  la 
merited.11  It  la  our  position,  however,  that 
the  long  term  benefit*  realized  from  the 
construction  of  ASa  In  dynamic  domain*  will  far 
outweigh  any  Initial  atartup  coata. 

**  Aa  the  reader  may  notice,  the  n),.e  systems 
described  In  thl*  article  conatltute  only  a  small 
number  of  choae  program*  presented  elsewhere 
aa  AS*.  Some  of  thoae  other  iyattna  have  been 
eliminated  from  our  present  discussion  because 
they  are  better  labeled  aa  framework*  tor  knowledge 
representation  rather  than  aa  fu’l  blown  ayatema 
Still  other#  were  omitted  because  they  are  not 
truly  adaptive  In  the  aenae  of  having  mi-chanlsms 
responsible  for  reorganizing  memory  and  control 
atructurea.  Finally,  thoae  deemed  too  narrow 
In  scope  to  be  of  general  Interest  were  not 
included  for  discussion.17 
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The  BACON .  5  ptoifta  written  by  Langley, 
Bradshaw,  and  Simon  “  wee  Intended  (or  a  very 
different  purpose.  Clven  numeric  data  and  varlabla 
specifications  It  notices  patterns,  abstracting 
out  regularities  among  combinations  of  variables 
as  "concepts".  BACON. 5  has  discovered  the  Ideal 
Cas  Lav,  Ohm's  Law,  and  the  lav  of  gravitation, 
among  others.  A  program  somewhat  similar  to 
BACON. 5  Is  UNIMEM. 16  Jr.  categories  and  makes 
genera  l  l  znt  Iona  from  numeric  data  without  using 
any  star  1st  leal  heuristics  such  as  those  employed 
In  the  BACON. 5  system.  For  example,  given  data 
on  areAS  and  populations  of  statss.  It  categories 
them  into  "large"  end  "small"  claeaas  and  forma 
Appropriate  general  1  zat  lone  about  thoee  classes. 
For  Instance,  It  generalize*  that  small  eeatea 
have  small  populations  without  Incorrectly 
Inferring  that  lerge  states  necessarily  have 
large  populations  (e.g.,  Alaska). 

The  next  three  tystem*  arc  almllar  to  one 
another  In  that  they  all  start  out  with  little 
or  no  knowledge  about  a  given  topic  and  progrese 
up  to  an  expert  level.  Kolodner'e  17,2,18 
Computer  lied  Ynle  Retrieval  end  Updating  System 
(CfRUS)  learns  facts  about  Cyrua  Vance  and  Edmund 
Muskie's  terms  as  Secretary  of  State.  The 
Integrated  Partial  Parser,  1PP,  learns  about 
International  terrorism  from  newspaper  articles. 
Finally,  a  follow-up  tc  IPP  called  RESEARCHER  ^ 
trims  and  arsvers  question*  about  patent 
abstracts. 


necessary  to  solve  the  puzzle  In  the  minimum 
number  of  move*. 


Having  read  theee  description*  the  reader 
should  now  hwve  an  appreciation  for  how  diverse 
these  ASs  really  are.  nevertheless,  It  la  argued 
that  an  organizational  framework  may  be  uetd 
to  view  theee  system*  which  will  allow  their 
simultaneous  comparison  on  a  number  of  dimensions. 
That  framework  will  now  be  presented. 


Framework  for  Comparing  Adapt Ive  Systems 


The  AS*  Juet  described  may  be  compared 

with  regard  to  the  following  feeturesi  (e)  the 
types  of  representation  schemes  employed)  (b) 
the  mechanism*  by  which  Incoming  knowledge  le 
stored  in  the  system)  (c)  retrieval  operations 
used  to  acceaa  previously  stored  Information; 
(d)  mechanisms  used  to  control  Information 
processing  In  the  system)  and  (e)  strategies 
for  adapting  the  knowledge  base  and  control 
structure  to  accomodate  the  changing  environment. 
In  this  section,  our  aet  of  AS*  will  be  discussed 
with  regard  to  each  of  these  feature*.  Following 
this,  an  evaluation  of  the  different  designs 
employed  In  building  these  ASs  will  be  presented. 


pe  ot  Knowledge  Representation*  used 


1.  Production  Rules 


For  purpose*  of  the  present  paper,  the  moat 
Double  aspects  of  all  of  these  systems  are  that 
they  hove  mechanisms  for  organizing  and  updating 
their  knowledge  bases  such  that  Interesting 
generalizations  and  dlscrlmlnatlona  result. 
For  example,  after  learning  about  e  number  of 
instances  in  which  kidnap  victim*  In  Italy  hsppened 
to  be  businessman,  IPP  1*  able  to  generalize 
thu  a  new  (unidentified)  Italian  who  la  kidnapped 
Is  likely  to  be  a  buslneasuan.  IPP  does  not 
make  the  error  of  the  converse,  however,  which 
would  be  to  Infer  Chet  a  given  kidnapping  took 
place  In  Italy  simply  because  the  victim  le  e 
bus lne s  sman. 


The  first  type  of  representation  listed 
In  Table  1  le  the  production  rule,  Production 
rules  art  lf-then  or  condition-action  pairs  which 
invoke  a  particular  action  to  be  carried  out 
when  the  contencs  of  working  memory  match  s 
specified  condition.  By  this  description  It 
Is  clear  that  production  rules  embody  procedural 
knowledge,  but  they  are  closely  tied  to  declarative 


knowledge  a*  well.  Consider  the  follovln 
of  a  production  rule  In  the  BACON  eysteoi 


If  you  see  *  number  of  descriptions  at  Level 
L  In  which  the  dependent  variable  (D) 
has  th*  same  number  value  (V), 


An  example  of  *n  AS  In  a  very  different 
domain  le  the  Blocks  World  program,  firet 
Introduced  by  Ulnograd  70  and  then  updated  by 
Winston.  71  This  system  learns  about  various 
possible  configurations  of  a  set  of  geometric 
figures.  For  example,  Wlnogrsd'e  70  system  can 
remember  sequences  of  sovei  for  any  given  object 
In  the  blocks  world  and  Infer  procedure*  neceesary 
foi  those  sequences  to  have  been  performed,  even 
though  that  Information  was  not  explicitly  stored. 

The  Inst  two  AS#  learn  rules  about  how  to 
•  olve  some  problem.  The  highly  acclaimed 
Meta- OFNDRAL  72  l8  4  program  which  sits  over 
the  expert  system  DENDRAL  and  learns  cleavage 
rules  used  by  mass  spectrometer*.  Th*  second 
version  of  the  Strategic  Acquisition  Coverned 
bv  Experimentation  system,  SAGE. 2  7  learns  rule* 

(or  solving  the  Tower  of  Hanoi  puzzle  cask. 
Starting  out  with  a  small  number  of  heurlatlcs 
this  program  quickly  acquire*  the  knowledge 


Then  create  a  new  description  *t  level  L+l 
In  which  the  value  of  D  Is  also  V  and 
which  has  all  condition*  common  to 
th*  observed  descriptions. 

This  rule  results  in  the  creation  of  another 
production,  which  will,  In  some  sense,  represent 
declarative  knowledge  within  It*  own  conditions. 
When  these  new  condition*  are  matched  by  the 
contents  of  working  memory,  the  new  rule  will 
be  eligible  for  implementation.  Production  rule* 
are  used  In  ACT*,  BACON. 5,  Meta-DENDRAL,  snd 
SAGE. 2. 

2.  Memory  Organization  Packets  (MOP*) 

Following  in  a  tradition  sonewhst  similar 
to  that  of  Minsky's  frsoes,  7^>  Schank  snd  Abelton's 
scripts,  75  «nd  Schaok's  dynamic  memory,  7b  UNIMEM, 
CYRUS,  iPP,  snd  RESEARCHER  all  use  memory 
organization  packets,  or  MOPs.  MOPs  are  a  type 
of  semantic  network  within  which  knowledge  Is 
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imaged  hierarchically  by  topic. ^  Information 
that  conititutea  a  apcclflc  exception  to  ihe 
throe  of  the  MOP  la  given  a  aeparace  Index,  or 
libel,  which  may  be  uaed  to  locace  It  more  quickly 
during  the  retrieval  proceaa.  When  two  or  more 


ileaentt  are  organized 

under  one 

Index, 

a  new 

lub-HOP  Is  formed,  thus 
of  the  memory  network  . 

preserving 

the  modu 

Larlty 

3.  Propositions 

Propositions  have 

become  a 

fairly 

common 

anna  of  abatract  knowledge  representation  in 
M  lysteras.  They  preserve  semantic  relationahlpa 
between  arguments  or  objects.  For  Instance, 
the  proposition  "(kiss  Sue  Bill)"  preserves  the 
relation  "kiss"  between  the  arguments  "Sue"  and 
"Sill".  Notice  that  the  proposition's  structure 
li  Independent  of  information  order.  ^  That  is, 

”( k  1  a  a  Sue  Bill)"  does  not  encode  the  difference 
between  "Sue  kissed  Bill"  and  "Bill  was  kissed 
by  Sue".  Only  the  semantic  relation  la 
represented.  Propositional  -epreaentat  Iona  are 
uied  bocn  In  ACT*  and  Che  Blockc  World  ayatem. 

4.  Tc-nporal  Strings 

Temporsl  strings  sre  a  type  of  repreaentac ion 
used  In  the  ACT*  system  to  maintain  order 

Informal  Ion .  Knowledge  about  order  la  important 
to  many  talks,  one  of  vhtch  la  the  analyala  of 

linguistic  lnforuaclon.  Temporal  atrlnga  provide 
i  more  efficient  meant  of  atorlng  order  than 
do  proposition!,  hut  cannot  be  uaed  in  the  place 
of  propositions  since  they  do  not  Incorporate 

Information  about  meaning  as  effectively. 

5.  Spatial  Images 

This  final  type  of  representation  preaervea 
the  configuration  of  elements  in  a  spatial  array. 
This  construct  la  used  In  the  ACT*  ayatem  to 

codel  pattern  recognition  behavior  in  humans. 
The  Important  point  to  note  here  about  apatlal 
Irsges  Is  that  they  preserve  only  information 
ibout  the  relative  physical  position*  of  object* 
In  an  array,  and  not  necessarily  Information 
ibout  vhst  these  objects  actually  look  like. 

8.  Information  Storage  Strategies 

Having  established  the  tools  for  representing 
knowledge  in  our  set  of  ASs,  It  la  now  of  intereat 
to  explore  the  strategies  used  to  store  new 
Information  In  these  systems. 

As  shown  In  Table  2,  one  strategy  for  adding 
r.f  v  Information  to  the  system  Is  to  learn  every 
new  stimulus  as  It  Is  presented.  This  simple 
lrpronch  Is  used  in  the  UNIMEM,  CYRUS,  Blocks 
World,  and  Me  t  a-DENDRAL  systems.  Another 
ilternatlve  is  to  be  more  discriminating  and 
•  tore  mw  information  in  the  network  only  after 
it  has  been  shown  to  have  some  Importance,  for 
Instance  after  It  has  occurred  several  times. 
The  ACT*.  BACON. 5,  IPP,  RESEARCHER,  AND  SAGE.? 
systems  all  employ  this  type  of  approach. 


Once  the  decision  has  been  made  to  add  new 
information  to  the  eyetem,  the  question  arises 
as  to  how  It  will  be  etored  In  relation  to  already 
existing  memory  elements.  Of  course  the  new 
data  could  simply  be  added  randomly.  However, 
a  more  strategic  approach  la  to  store  related 
item*  "near"  one  another  In  the  network.  The 
ACT*,  UNIKSM,  CYRUS,  IPP,  and  RESEARCHER  ayatem* 
all  employ  this  design,  linking  together  items 
which  are  semantically  or  contextually  related. 

This  method  of  storage  eervet  two  useful 
purposes.  First,  provided  one  use*  the  right 
type  of  knowledge  representation,  it  will  not 
be  necessary  to  explicitly  store  all  semantic 
features  associated  with  every  new  element  since 
some  of  those  may  already  be  represented  In  nearby 
(aubauDlng)  atructuiea.  Second,  retrieval  of 
information  la  facilitated  when  memory  la  organized 
thematically.  that  la,  one  need  only  get  to 
the  right  are*  in  memory  and  look  there  rather 
than  exhaustively  searching  the  entire  network. 

Another  useful  tactic,  Implemented  In  UNIHEM, 
CYRUS,  IPP  and  RESEARCHER,  la  to  mark  new  element* 
with  special  indices  which  denote  the  way  in 
which  their  themes  differ  from  those  of  their 
"parent"  nodes  In  the  network.  Thla  approach 
has  the  same  advantages  aa  atorlng  related  Items 
near  one  another  In  memory.  It  1*  particularly 
useful  In  retrieval  as  will  be  illustrated  in 
the  next  section. 


C.  Retrieval  Mechanisms 

The  three  types  of  retrieval  mechanisms 
employed  by  our  set  of  ASs  sre  listed  in  Tsblc 
3.  The  first  of  these,  pattern  matching,  1* 
a  method  whereby  an  item  1*  retrieved  from  memory 
if  the  physical  feature*  of  it*  representation 
match  those  of  some  retrieval  cue.  Pattern 
matching  la  comsonly  uaed  in  production  systems 
where  rules  are  selected  for  use  based  on  the 
match  between  their  condition*  and  the  elements 
in  working  memory.  The  implementation  of  this 
mechanism  usual,/  Involve*  exhaustive  memory 

search .  The  ACT*,  BACON. 5,  Blocks  World, 
Me ta-DENDRAL,  and  SACE.2  systems  all  employ  pattern 
matching. 

A  very  different  approach  to  retrieval  la 
to  search  the  network  by  traversing  Indices 
assoclsted  with  Individual  categorical  memory 

structures.  Recall  that  the  MOPx  used  to  represent 
knowledge  In  the  UNIMEM ,  CYRUS,  IPP,  and  RESEARCHER 
systems. 

MV1  There  are  Indices  which  denotes  the  thematic 
content  of  an  HOP  and  other  Indices  which  tag 
element*  Involving  exception*  to  these  themes. 

Retrieval  begins  with  an  Index  which  Is  compared 
to  other  indices  In  the  network.  When  a  match 
is  found,  the  memory  elements  subsumed  underneath 
that  index  sre  searched.  The  result  Is  that 
the  scope  of  the  retrieval  process  la  llmlced 

to  only  those  areas  of  memory  which  are  the  most 
relevant,  s  more  efficient  strategy  than  exhaustive 
search . 
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Another  way  of  facilitating  economic  search 
la  through  spreading  activation,  as  Implemented 
In  ACT*.  At  any  given  time,  the  memory  elements 
thac  match  the  contents  of  working  memory  are 
temporarily  activated,  and  this  activation  spreads 
to  "nearby"  connected  elements  in  the  network. 
Since  the  storage  strategy  In  ACT*  Is  to  form 
connections  aucng  related  Items  as  they  are 
learned,  the  nearby  Items  that  get  activated 
will  also  be  related  to  the  contents  of  working 
memory.  That  la,  all  activated  elements  will 
be  potentially  relevant  to  the  current  context. 
Search  Is  quite  efficient  since  the  retrieval 
process  Is  directed  only  to  the  areas  of  the 
network  that  are  activated,  rather  than  to  the 
network  as  a  whole. 

D.  Principles  of  Control  and  Conflict  Resolution 

As  might  be  expected,  the  search  mechanisms 
Just  described  often  result  In  the  retrieval 
of  more  than  one  acceptable  memory  element. 
The  problem  then  arises  as  to  how  to  choose  among 
the  potential  candidates,  selecting  the  most 
appropriate  one  for  Instantiation.  A  set  of 
principles  used  to  resolve  these  conflicts  is 
presented  In  Table  4. 

The  most  obvious  means  of  deciding  among 
potential  elements  Is  to  choose  the  one  which 
most  closely  matches  the  retrieval  cue  (or  contents 
of  working  memory).  this  strategy  Is  adopted 
In  all  of  the  ASs  wc  arc  cona  Idc  r  lng .  In  practice, 
however,  one  might  find  that  aa  systems  get  more 
and  more  complex,  this  simple  tactic,  will  not 
always  work  well.  iii.>t  Is,  there  may  he  several 
elements  which  match  to  tne  same  degree,  In  which 
ca3e  further  means  for  choosing  among  elements 
muse  be  provided. 

The  ACT*  and  SAGE. 2  systems  employ  several 
strategies  for  resolving  these  conflicts  which 
rely  on  analysis  of  features  of  the  operators 
themselves.  For  instance,  each  operator  In  these 
AS#  has  some  strength  associated  with  It  which 
la  either  Increased  or  decreased  after  each 
Instantiation,  depending  on  the  outcome.  Competing 
operators  are  then  selected  depending  upon  their 
relative  strengths  or  histories  of  being  useful. 
The  field  of  potential  candidates  can  be  narrowed 
further  uslnf  principle  called  data 

refractoriness  ,eby  no  one  operator  can  serve 
In  two  patterns  simultaneously.  Finally,  the 
ACT*  program  relies  on  a  principle  of  specificity 
whereby  the  moat  specific  of  two  operators  which 
match  equally  well  will  be  chosen.  For  Instance, 
If  the  pattern  "barn"  were  being  matched  and 
the  tvo  elements  "ba"  and  "bar"  were  competing, 
"bar"  would  be  choaen  since  It  is  the  most 
spec  if  1c . 

In  addition  to  relying  on  traits  of  the 
Individual  operators  to  resolve  conflicts,  system 
behavior  may  be  controlled  In  both  ACT*  and  SACE.2 
by  contextual  constraints.  Context  Is  used  to 
govern  choice  of  operators  In  SACE.2  through 
the  “fttfii:  of  use"  rule.  That  Is,  the  most 

recently  used  operator  Is  chosen  over  others 


on  the  assumption  that  It  must  be  the  most  relevant 
to  the  problem  at  hand.  This  choice  will  nave 

nothing  to  do  directly  with  the  strength  of  the 
operator  or  whether  the  pattern  is  already 
represented  elsewhere. 

Perhaps  a  more  Interesting  mechanism  Is 
used  In  the  ACT*  program.  Problem  solving  In 
ACT*  Is  goal-driven.  That  is,  a  large  task  having 
one  ultimate  goal  can  be  broken  aown  Into  several 
subtasks,  each  having  a  goal  of  Its  own.  The 

current  goal  of  Interest  is  represented  in  working 
memory  and,  as  such,  disallows  the  Instantiation 

cf  any  operators  not  directly  related  to  Its 
completion.  This  mechanism  greatly  reduces  ’he 
field  of  potential  candidates  from  the  start, 
thus  eliminating  many  conflicts  that  might 
otherwise  arise. 

E .  Mechanisms  for  Adaptation 

Thus  far  the  types  of  mechanisms  described 
are  not  In  any  way  peculiar  to  systems  under 
consideration  here.  For  instance,  any  traditional 
production  system  has  to  have  some  means  of 

retrieving  information  trom  memory  (usually  pattern 
matching),  and  some  way  to  resolve  conflicts 

among  competing  memory  elements  (usually  degree 
of  match  and  specificity).  The  trouble  with 

these  typical  expert  systems,  however,  is  that 

when  they  "learn"  they  do  so  simply  by  adding 
new  fact9  or  rules.  These  additions  are,  for 

the  most  part,  made  without  regard  to  the 

integration  and  reorganize!  Ion  of  thin  new 

Information  with  existing  knowledge.28.  What 
sets  adaptive  systems  apart  from  traditional 
expert  systems  Is  their  ability  to  modify  their 
own  knowledge  an  control  structures  with  experience 
In  some  meaningful  manner.  The  ways  In  which 
adaptation  occurs  are  listed  In  Table  5. 

The  most  common  way  of  Incorporating 

experience  Into  the  system  is  to  strengthen 

operators  that  have  either  been  presented  a  number 
of  times  or  have  somehow  proven  useful  In  the 
past.  In  fact,  every  AS  In  our  set  of  nine  except 
Blocks  World  employs  this  method.  However,  the 
converse  of  this  strategy,  which  is  to  weaken 
an  operator  whose  instantiation  has  produced 
undesirable  results,  Is  not  as  popular.  Only 
the  ACT*  and  SACE.2  systems  employ  this  tactic. 

A  method  more  orasttc  than  weakening  the  strength 
of  an  operator  is  to  remove  it  from  the  network 
altogether  when  if  ceases  to  be  appropriate  for 
current  applications.  This  13  sometimes  done 
with  production  rules  In  ACT*,  but  only  very 

conse  rvat ive ly . 

In  addition  to  changing  the  relative  strengths 
and  weaknesses  of  operators  in  the  system,  it 

is  possible  to  actually  create  new  ones  usir.i. 
knowledge  embedded  in  previously  existing 
representations.  In  particular,  generalization 
Involves  the  formation  of  new  elements  which 
embody  commo"  features  cf  ,c\  :ral  re pr e sen t a t l ons 
already  In  the  network.  The  new  operator  sc,  1  ies 
In  more  cases  since  it  is  less  s  pe**  i  f  i  r  m-.r 

Its  p :  ■  J  c  c  e  3  s  o  r  3  .  This  v.  ■  i  ■■■  1  s  m  :  3  1  ,  !  ■  '  ■  u  i.  v  0 

in  every  AS  except  Me t a - DENURAL  and  SAGE. 2. 
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In  concraat  Co  generalization,  new 
reprtaent  at  Ions  may  be  created  through  the  proceia 
of  discrimination.  Discrimination  occurs  when 
eicra  delimiting  features  are  added  to  an  operator 
vhteh  restrict  the  contexts  in  which  it  can  apply. 

In  other  words,  discrimination  creates  constructs 
that  are  special  cases  of  already  existing  ones. 
This  la  the  type  of  process  that  occurs  in  CYRUS, 
lor  Instance,  when  an  index  is  added  beneath 
the  top  level  of  an  MOP  to  signify  that  an  element 
contains  information  In  exception  to  the  theme 
of  the  parent  MOP.  Although  potentially  quite 
powerful,  this  strategy  is  employed  only  in  the 
ACT*  and  CYRUS  systems. 

The  discussion  up  to  this  point  has  focussed 
on  the  comparison  of  our  nine  ASs  with  regard 
to  the  types  of  knowledge  they  acquire,  along 
with  the  types  of  rep.esentationa ,  storage 
mechanisms,  retrieval  strategies,  control 
principles,  and  means  of  adaptation  used  by  each. 
New  that  the  ASs  have  been  couched  in  terms  of 
this  organizational  framework,  it  1p  of  Interest 
to  determine  whether  any  guiding  principles  may 
be  uaed  in  conjunction  with  this  framework  to 
help  rrnenreher*  both  (a)  build  better  ASs  and 
(b)  evaluate  our  nine  Ass  in  relation  to  one 
another . 

Building  the  "Ideal"  Adaptive  System 

The  remainder  of  thi*  paper  will  outline 
lore  general  guidelines  for  building  what  we 
call  the  "Ideal"  AS.  These  guidelines  will  be 
csst  In  terms  of  the  organizational  framework 
already  presented.  It  is  hoped  that  these 
principles  will  not  only  aid  in  the  design  of 
future  ASs,  but  might  serve  as  a  set  of  metric* 
by  which  to  evaluate  already  existing  systems. 

first  It  la  our  feeling  that  there  is  no 
way  to  select  the  "beet"  representation  a  priori. 
In  fact  some  might  argue  that,  aa  far  as  the 
outvard  behavior  of  the  system  is  concerned, 
any  deficiencies  associated  with  the  choice  of 
representation  structure  may  be  compensated  for 
in  terms  of  the  procedures  Implemented  to  operate 
on  those  structures.^  We  will  concede  that 
tome  representations  may  lend  themselves  to  certain 
tvpej  of  problems  more  easily  than  others,  although 
this  will  br  a  mat i -r  of  convenience. 

Having  dispensed  with  the  question  of 
representation,  we  will  now  proceed  with 

recemrenda  t  Ions  concerning  the  other  Issues  of 
storage,  retrieval,  control,  and  adaptation, 

fter  reading  through  the  earlier  comparisons 
f  the  ASs  on  each  of  the  dimensions  Just  listed, 
<he  reader  may  feel  that  s  "more  la  better"  rule 
is  applicable.  That  la.  It  might  appear  thac 
variety  Is  the  key,  with  the  better  aystems  being 
those  which  Incorporate  aeveral  different  type* 
of  capabilities  In  each  of  these  areas.  Although 
;!>ia  '.a  true  to  a  certain  extent,  It  la  not 

necessarily  the  best  prescription  for  success. 

father  than  a  "more  la  better”  methodology, 
we  advocate  that  researchers  adopt  an  approach 


whereby  a  combination  of  both  top-down  and  bottom- 
up  atrateglea  are  Implemented  in  the  ayatem  design. 
Bottom-up  strategies  are  those  for  which  properties 
of  individual  operators  are  important,  whereas 
top-down  strategies  involve  overall  system 
behavior.  Perhape  this  concept  is  best  illustrated 
with  an  example. 

In  terms  of  storage  aechanlsms,  a  top-down 
atrategy  1*  to  score  related  Irena  near  one  another 
in  the  network.  Thlu  type  of  strategy  Impacts 
directly  on  the  global  organization  and  performance 
of  the  system.  Similarly,  the  approach  of  storing 
item*  In  a  hierarchlal  organization  such  thac 
concepcs  at  t  given  level  subsume  chose  below 

and  are  aubauxed  by  thoue  above  will  later  affect 
global  retrieval  operations. 

In  contrast,  a  bottom-up  storage  strategy 
is  to  learn  every  new  stimulus  as  it  Is  presented, 
regardless  of  les  semantic  concent -- thac  is, 
regardless  of  ita  eventual  position  in  the  overall 
memory  network.  This  strategy  has  not  been 
employed  aa  much  aa  it  could  have  been,  probably 
due  to  the  semantic  primacy  bias  in  the  human 

memory  literature.^  The  traditional  wisdom 
has  been  that  people  primarily  remember  semantic 
content  in  ‘lieu  of  lowrr  level  information 
involving  physical  feature*  of  atimuil.  Recent 
experiments  on  human  memory  have  shown,  however, 
thet  we  do  indeed  remember  thi*  low  level 

Information,  often  better  than  t’ue  semantic 
content. ^  To  the  degree  that  the  human  la  accepted 
*•  a  useful  model  for  designing  intelligent 

aystems,  the  importance  of  bottom-up  scorage 
strategies  should  not  be  Ignored  when  building 
ASe. 

Assessing  the  relative  merit  of  combination* 
of  storage  mechenlsme  of  the  ASe  from  Table  3, 
It  may  be  eeen  chat  only  the  UNIMEM  and  CYRUS 
syatema  employ  boch  top-down  and  botcom-up 
strategies.  Specifically,  boch  are  designed 
to  learn  every  new  stimulus  as  it  is  presented 
and  to  store  related  Item*  near  one  another  in 
memory.  In  addition,  both  of  these  systems  use 
the  MOP  representet ion  from  which  a  hierarchical 
organization  la  creaced. 

Applying  the  top-down/bottom-up  distinction 
to  retrieval  mechanisms.  It  may  be  argued  thac 
memory  search  aided  by  either  index  traversal 
or  spreading  activation  la  top  down  since  these 
mechanism*  are  implemented  with  regard  to  the 
memory  network  aa  a  vhole.  On  the  other  hand, 
paccern  matching  la  a  bottom-up  retrieval  tactic 
•Ince  it  depends  only  on  characteristics  of 
individual  operator!.  Again,  research  on  human 
memory  has  ahovn  that  both  bottom-up  and  top-down 
processes  play  Important  roles  in  the  determination 
of  retrieval  performance  ^  lending  credence 
to  Che  aesertlon  that  this  combination  might 
be  useful  in  the  design  of  ASs.  In  Table  k  we 
see  that  the  only  system  employing  both  top-down 
and  bottom-up  retrieval  strategies  Is  ACT*. 
Element*  may  be  retrieved  from  memory  In  ACT* 
using  either  spreading  activation  or  pattern 
matching. 
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Ao  with  storage  and  retrieval,  there  are 
principles  of  conflict  resolution  which  embody 
both  top-down  and  bottom-up  featurea.  For  example, 
n  top-Hown  strategy  for  delimiting  the  field 
of  competing  operators  la  to  Impose  a  goal 
hierarchy  on  the  system.  The  choice  of  priorities 
rather  than  the  traits  of  any  Individual  operators. 

A  similar  argument  may  be  made  for  the  strategy 
of  choosing  the  most  recently  used  operators 
over  others  that  match  as  closely. 

All  other  conflict  resolution  principles 
discussed  earlier  Involve  bottom-up  analysis. 
These  Include  degree  of  match,  apeclflclty,  data 
refractoriness,  end  operator  atrength.  Notice 
that  all  of  these  require  chat  features  of 
Individual  operators  dictate  which  will  be  selected 
for  Implementation  aa  opposed  to  overall  contextual 
const  ra  int s  . 

Looking  to  Table  5  again,  It  may  be  aeen 
that  the  only  two  systems  which  employ  top-down 
and  bottom-up  strategies  for  controlling  choice 
of  operators  are  ACT*  and  SACE.2.  Both  uae  several 
bottom-up  strategies.  Of  greater  Interest  la 
Chjt  t  lie  top-down  approach  to  control  In  ACT* 

Is  Implemented  tn  a  goal  hierarchy,  whereat  a 
recency  criterion  serves  as  a  context  mechanism 
In  SACE.2. 

The  final  dimension  to  be  considered  from 
the  organizational  framework  la  the  manner  In 
which  systems  are  allowed  to  adapt.  Of  Che 
mechanisms  presented  earlier,  adaptation  through 

griitM  a  I  !  r.nc  Ion  and  discrimination  are  top-down 
In  nature  because  they  occur  only  when  similarities 
or  differences  among  aeveral  structurea  are  noticed 
simultaneously  tn  one  context.  On  the  other 
hand,  Che  strengthening,  weakening,  and  unlearning 
of  Individual  elements  all  affect  only  one 
structure  at  a  time.  Since  they  do  not  directly 
impact  on  the  stacut  of  ocher  contextually-related 
elements,  these  strategies  are  classified  aa 

being  bottom-up.  All  of  the  ASa  Invoke  both 
top-down  and  bottom-up  adaptation  strategies 

except  Meta-DENDRAL  and  SACE.2  which  employ  only 
bottom-up  procedures. 

Cone lus Ions 

The  above  discussion  focused  on  developing 

an  organizational  framework  for  comparing  adaptive 
9ystens.  These  systems  are  of  Interest  relative 
to  traditional  expert  systems  in  that  they  (s) 
Ho  not  have  to  be  updated  by  hand  as  knowledge 
in  the  domain  of  Interest  chsngesi  (b)  can  start 
out  at  the  novice  level  without  explicit  access 
to  a  human  expert;  (c)  are  useful  In  "cutting 
edge"  domains  In  which  few,  if  any,  experts  exist; 
and  (d!  might  provide  better  Interfaces  for  novice 
users  since  their  own  memory  structures  have 
evolved  from  the  novice  level.  Nine  systems 
were  discussed  with  regard  to  an  organizational 
framework  for  comparing  ASs.  This  framework 
was  used  to  compare  the  systems  on  such  dimensions 
as  the  type  of  knowledge  acquired  by  each  and 
representations  used;  storage  and  retrieval 
strategies,  control  mechanisms,  and  metiiudo  used 
for  adaptation.  Based  on 


our  experience  with  the  latest  generation  of 
expert  sycteoe,  we  have  recommended  that  ASi 
be  designed  using  s  combination  of  top-down  sod 
bottoa-up  mechenlsmo  in  each  of  these  areas. 

The  development  of  this  framework  for  ASs 
systems  may  also  have  several  additional  benefits. 
As  the  most  near  term  example,  this  framework 
can  aid  In  our  understanding  of  M  aystts 
performance.  This  may  allow  researchers  to  address 
soots  of  the  perennial  At  questions,  such  as  hov 
one  measures  the  robustness  of  s  given  AI  systee, 
what  are  valid  benchmarks  for  AI  systems,  and 
how  sn  AI  system  can  be  designed  for  esse  of 
testing.  On  this  latter  point,  research  In  AS 
methodologies  could  expedite  the  rapid  prototyping 
of  AI  systems,  In  s  manner  analogous  to  ths 
techniques  used  In  the  electronics  Industry. 
Finally,  continued  development  of  ASs,  particularly 
through  rapid  prototyping,  can  lead  to  a  Issttr 
incorporation  of  AI  systems  Into  the  marketplace. 
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The  BDM  Corporation 
7915  Jones  Branch  Drive 
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Abstract 

The  concept  of  Attentive  Associative  Processing  trades  the  spatially  delocalized  and 
sel f-organizing  features  of  traditional  Associative  Architectures  for  improved  hardware 
efficiency  and  increased  speed  of  adaption.  Optical  computing  may  prove  to  be  ideally 
suited  to  implement  these  architectures. 


Introduction 


Optical  computing  systems,  which  hold  the  promise  of  massive  parallelism  in  computation 
and  interconnects,  have  attracted  the  attention  of  researchers  for  the  last  30  years. 
The  earliest  systems  proposed  and  demonstrated  used  the  ability  of  a  simple  and  passive 
(non-power  consuming)  component  like  a  convex  lens  to  perform  a  complex  2-D  Fourier 
transform  extremely  rapidly.  This  early  work  in  optical  computing/processing  was  directed 
toward  processing  of  synthetic  aperature  radar  data,  matched  filter  detection  of  images, 
image  restoration,  and  radar  and  sonar  signal  processing.  As  this  work  has  matured  and 
lias  transitioned  to  advanced  development  and  deployment,  the  optical  computing  research 
community  has  turned  its  attention  to  tackling  a  wider  variety  of  problems.  Current 
research  is  evolving  along  two  distinct  directions:  (1)  Toward  optical  logic  devices 

capable  of  extremely  high  speed  and/or  highly  parallel  operation,  (2)  Towards  analog 
systems  that  use  the  massive  parallelism  in  arithmetic  operations  and  interconnects  for 
novel  architectural  configurations.  In  the  last  two  years,  the  optical  computing  community 
has  been  strongly  motivated  to  study  the  work  in  human  cognition,  and  apply  the 
architectural  concepts  developed  in  modeling  neural  networks.  This  new  direction  ha- 
been  driven  by  the  conviction  that  the  human  brain  derives  its  astounding  power  for 
cognition  and  perception  from  a  massively  parallel  and  densely  interconnected  network 
of  relatively  simple  and  slow  components  -  i.e.  neurons,  and  that  these  are  exactly 
the  characteristic  strengths  of  optical  systems  as  conceived  in  the  second  direction 
mentioned  above.  The  discipline  of  neural  network  modeling  is  in  its  infancy  relative 
to  a  more  developed  field  like  signal  processing  and  ditigal  computing,  and  there  is 
a  notable  lack  of  universally  accepted  principles  to  guide  the  architectural  development. 
Nonetheless,  in  the  past  40  years  a  significant  amount  of  interdisciplinary  research 
has  been  carried  out  in  this  field  and  has  generated  valuable  insights  into  the  operation 
of  a  human  brain.  Most  notably,  the  work  of  Grossberg^-,  Kohonenb  and  Hopfield  3  has 
influenced  the  work  in  optical  computing  initiated  by  Psaltis  and  Farhat1*  and  Fisher, 
Giles,  and  Lee5.  The  growth  of  research  in  the  field  of  optical  associative  processing 
can  be  judged  by  the  large  number  of  presentations  at  the  Associative  Memories  and  Optics 
Symposium  at  the  1985  Annual  Meeting  of  the  Optical  Society  of  America  as  well  as  at 
the  Los  Angeles  Symposium  of  SPIE. 


In  this 
associative 
earlier  wor 
the  data  vt 
in  the  st re 
we  discuss 
Wo  ooscrib 
.1  s  ;j  o  c  i  r. :  v  o 
bow  those  c 
-2  .  ^  .  rol.at 


paper,  we  will  describe  that  portion  of  our  work  in  the  area  of  optical 
processing  that  is  a  departure  from  the  work  referred  to  earlier.  In  the 
'k:>,  tine  associativa  storage  was  based  mainly  on  the  correlation  matrix  of 
■ctors.  We  propose  an  associative  memory  model  that  readily  allows  a  change 
ngths  of  stored  states,  corresponding  to  a  shift  in  attention®.  In  this  paper 
the  implications  of  attentive  associative  architectures  to  optical  computing. 

the  mathematical  formulation  and  optical  implementation  of  attentive 
memory  and  its  relation  to  the  conventional  associative  memory,  and  we  discuss 
oncosts  can  be  transferred  to  different  application  areas  of  optical  computing, 
ioii.il  data  base  architectures  and  expert  systems. 


v«'C t o r 


Attentive  associative  memory 

ie,;r  model  of  an  associative  memory  designed  to  store  N-dimensional  column 
obtained  In  two  steps:  The  first  step  is  the  recording  of  the  set  of  P  input 
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vectors  in  an  N  x  N  memory  matrix  via  the  outer  product  operation  between  the  input 
vectors,  and  the  second  step  is  retrieving  the  data  vector  from  an  incomplete  and/or 
noisy  version  of  the  vector  itself  via  a  vector-matrix  multiplication.  These  two  steps 
are  described  mathematically  in  the  following  equations  [2]: 

M  =  v<i>T  RECORDING 

i 

v  =  M  v'  RETRIEVAL  [i] 


where  v  is  an  N-dimensional  column  vector,  M  is  an  N  x  N  matrix,  v‘  is  the  imperfect 
recall  vector,  and  v?*?  is  one  of  the  stored  vectors.  The  last  part  of  eq.(l]  can  be 
rewritten  by  a  substitution  for  values  of  Mjk  derived  from  the  first  part  of  eq.[l]: 

v.  -  Z  {  Z  v.(i>  v  (i>  >  V'v  [ 2 ] 

J  K  1  J  x  K 


j 

\ 

jj 


The  attentive  associative  memory  formulation  can  be  obtained  by  changing  the  order  of  | 
summation  eq.[2]  and  inserting  a  nonunif ormly  nonlinear  operation  after  the  first, 
summation.  The  resultant  equation  is  given  below:  j 


v.  -Zv.(i>  F.(i)  (  Z  v  v-  >  [31  ) 

J  J  *  <  K  K  *: 

This  equation  states  that  the  imperfect  input  vector  v*  is  first  compared  to  all  the j 

stored  vectors  in  parallel  via  an  inner  product,  the  resultant  scalar  is  transformed  | 
using  a  channel-dependant  nonlinearity,  F  (i),  and  then  used  as  a  coefficient  in  a  linear  * 
superposition  of  the  corresponding  stored  vectors.  The  output  is  an  estimate  of  the  ? 

stored  vector  that  is  closest  to  the  input  vector,  v*  .  The  nonlinear  operation,  F  ( i ) ,  ^ 

allows  one  to  suppress  spurious  correlations  and  to  emphasize  the  similarity  of  the  input  - 

with  a  selected  vector  (i.e.  focussing  attention  on  that  particular  vector).  , 

'! 

This  basic  model  of  associative  memory  can  be  modified  in  numerous  ways,  some  of  them! 

discussed  in  Ref.  2.  In  Ref.  3,  Hopfield  suggests  an  iterative  procedure  where  the  estimate; 

of  the  retrieved  vector  (v  in  eq.  [  1 1 )  is  used  to  calculate  an  improved  estimate.  Ini 
that  work,  the  data  vectors  were  chosen  to  be  binary,  and  this  knowledge  was  used  ini 

hardclippir.g  the  retrieved  vector  before  feeding  it  back  to  the  System.  Ref.  4  discusses* 
the  optical  implementation  of  the  Hopfield  model  via  an  optical  vector-matrix  multipliers 
with  feedback  and  a  threshold  nonlinearity  in  the  feedback  loop.  The  same  procedure1 

can  be  applied  to  the  attentive  associative  memory  model  described  in  eq.  [3 1  to  improve 
the  quality  of  retrieval. 

i 

In  Ref.  4  an  extension  of  the  Hopfield  Model  to  storage  of  images  was  proposed.  Since; 
images  are  2-D  matrices,  their  outer  products  result  in  a  4-D  tensor  (corresponding  to 
the  recording  step  in  eq.  (1]).  To  facilitate  the  realization  of  this  tensor.  Ref.  4>r 
proposed  an  optical  system  equivalent  to  eq .  (3)  in  that  it  also  performs  the  inner  product* 
between  the  images  before  forming  the  linear  superposition  of  the  stored  images.  The! 
system,  however,  did  not  involve  the  nonlinear  step  contained  in  eq.  [31-  A  recent  paper' 
by  Soffer  et  al  discusses  a  holographic  implementation  of  the  associative  memory  for 
storing  images,  in  which  the  correlation  between  the  input  and  the  stored  images  was* 
subject  to  a  nonlinear  operation7.  That  system  did  not  contain  provision  for  a  channel, 
dependant  nonlinearity  and  for  attention.  5 


Imolementation  of  attentive  associative  memory 


The  attentive  associative  memory  described  in  equation  [3]  can  be  implemented  optically! 
by  two  vector-matrix  multipliers  that  are  connected  in  a  loop  with  appropriately  designed! 
point  nonlinearities  between  them.  The  schematic  diagram  of  the  resulting  system  isj 
shown  in  Figure  1.  The  first  veer or-mat“i:r  multiplication  performs  an  inner  product* 
(correlation)  between  the  corrupted  i  vector  and  all  the  stored  vectors  in  parallel. g 

The  resultant  inner  produc  ..  are  transformed  via  the  nonlinear  operation  Fy  that  couldj 
be  different  for  each  channel.  The  transformed  inner  products  are  tuer.  input  to  the, 
second  vector-matrix  multiplier,  which  calculates  a  linear  superposition  of  the  stored’ 
vectors.  Tins  linear  superposition  is  subject  to  another  nonlinear  transform  that  reflects? 
a  priori  knowledge  about  the  nature  of  the  data  vectors  (e.g.  positivity,  binary  values] 
etc.  )  .  The  result  is  an  improved  estimate  of  the  stored  vector  to  which  the  corrupted? 
input  vector  corresponds.  This  estimate  is  fed  back  to  the  first  vector-matrix  multiplier! 
and  the  oroce-Jure  is  repeated  until  a  stable  state  is  reached.  j 


ho  behavior  of 
n  on  l  i  no  a  r  i  1. 1  n 
work,  binary  v 


attentive 


jsociat i ve 


>ry  is  governed  by  the  nature  of  thejj 


that  operate  in  the  data  domain  and  the  inner  procuct  domain.  Inj 
ectors  were  chosen  as  the  data  vectors.  The  nonlinear  transform  chosen) 
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is  shown  in  Figure  2.  It  contains  a  threshold  level  for  the  input  signal  below  which  ; 
the  output  is  set  to  zero,  and  a  saturation  level  for  the  output.  For  intermediate  values  { 
of  the  input,  the  transformation  is  linear.  The  critical  parameters  fojf  this  function 
are  threshold  and  gain  values.  In  our  earlier  publication®  we  discussed  results  obtained 
when  these  parameters  were  held  fixed  during  the  successive  iterations.  Although  the 
system  still  showed  the  desired  behavior  of  error  correction  and  attention,  it  was  noted 
that  the  system  can  only  correct  either  bit-dropout  or  cross-talk  type  of  errors  at  one 
time.  In  this  paper  we  describe  one  method  of  adaptively  setting  the  values  of  the 
parameters  during  each  iteration.  The  basic  principal  used  is  calculating  the  average 
level  of  activity  in  the  data  vector  or  the  inner  product  vector  during  each  iteration 
and  setting  the  threshold  point  and  the  saturation  point  in  the  nonlinear  transfer  function 
as  certain  multiples  of  that  value.  Thus,  the  parameters  are  determined  through  global 
activity  measurement  but  are  applied  to  the  vectors  point-by-point.  The  result  is  that 
values  that  are  below  average  are  suppressed  and  values  that  are  much  above  average  are 
clipped  to  the  maximum  allowable  output  value.  Since  these  nonlinearities  are  incorporated 
in  both  the  data  domain  and  the  inner  product  domain,  the  iterative  process  tends  to 
drive  the  vector  to  its  nearest  neighbor  among  the  set  of  stored  vectors.  It  should  j 
be  noted  that  this  is  accomplished  even  when  the  stored  vectors  have  a  high  degree  of  j 
cross-talk.  The  conventional  models  of  associative  memory  that  are  based  on  correlation 
matrix  formulation  either  assume  orthogonal  or  near-orthogonal  stored  vectors,  or  produce  f 
results  that  are  optimum  only  in  a  least-squares  sense  without  being  exact  when  the  cross-  I 
talk  is  significant.  The  attention  of  the  system,  or  the  strength  of  a  given  stored  j 
state,  can  be  manipulated  by  changing  the  nonlinear  transfer  function  associated  with  j 
that  vector  in  the  inner  product  domain.  A  convenient  way  of  changing  the  function  without 
changing  its  shape  is  to  modify  the  multipliers  that  determine  the  threshold  and  saturation 
values  with  respect  to  the  average  activity  level  in  the  inner  product  domain.  Lowering 
the  threshold  as  well  as  the  saturation  level  will  increase  the  attention  provided  to 
that  vector,  and  increasing  these  parameters  will  reduce  the  attention  given  to  that 
vector.  This  easy  manipulation  of  the  strengths  of  the  stored  states  is  not  achievable 
with  the  conventional  associative  memory  models,  as  that  requires  recalculation  of  all 
the  elements  of  the  correlation  matrix  in  order  to  accomplish  the  same  purpose. 

Computer  simulations 

The  attentive  associative  memory  described  in  the  previous  section  was  simulated  on 
a  Personal  Computer.  Four  16-bit  binary  vectors  were  stored  in  an  attentive  associative 
memory.  The  four  vectors  to  be  stored  are  given  below: 

A  1111000011110000 

B  1010101010101010 

C  1010010110100101 

D  10  0  1011001101001 

It  can  be  seen  that  these  vectors  have  a  large  o-  '  <d.  This  observation  can  be  quantified 
by  calculating  inner  products  betv.aen  all  possible  pairs  of  these  vectors.  The  results 
are  given  below: 

A  *  A  -  3  *  B  =  C*C  =  D*D  =  8 

A'B=A*C=A*D-B*C=B*D=C*D=4 

The  auto-  to  cross-correlation  ratio  is  therefore  only  2:1.  In  the  conventional  models 
of  associative  memory  described  by  eq.  [11,  the  retrieval  works  perfectly  only  when  the 
vectors  to  be  stored  are  orthogonal  to  each  other.  The  use  of  a  nonolinearity  and  an 
iterative  procedure  described  by  Hopfield  relaxes  the  requirement  on  the  set  of  input 
vectors  to  pseudo-orthogonality,  but  still  assumes  that  the  auto-  to  cross-correlation 
ratio  is  equal  to  square  root  of  N,  where  N  is  the  size  of  the  vector.  Therefore,  when 
the  vectors  shown  in  eq .  [4]  are  stored  in  a  conventional  associative  m.  ~ ous 

results  are  obtained  ever,  when  a  perfect  version  of  the  input  vectr;  is  available  tor 
recall.  For  example,  the  presentation  of  vector  A  leads  to  a  stable  state 

L  0  1  1  0  0  0  0  l  1  1  £  0  0  0  0 

when  ".no  vectors  are  hardclipped  at  a  level  8  to  binarize  them.  A  different  level  for 
har'ici  i::pi  ng  changes  the  nature  of  the  errors  but  not  their  number.  The  use  of  adaptive 
non  i  :  non  r  i  ty  of  trie  form  described  in  the  earlier  section  does  not  help  the  situation 
t  and  will  oive  similarly  erroneous  results  with  the  data  shown  in  eqaution  [4|.  1 
This  lac",  indicates  that  if  the  stored  vectors  have  a  high  degree  of  cross-talk,  then 
using  m  iterative  procedure  or  a  nonlinearity  in  the  data  domain  is  not  adequate.  | 
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The  attentive  associative  memory  contains  a  provision^'  for  introducing  a  nonlinear 
transformation  on  the  inner  products  and  hence  can  suppress’  cross-talk  effectively  beford 
the  constraints  in  the  data  domain  are  applied  to  the  linear  combination  of  the  stored 
vectors.  The  parameters  for  nonlinear  transformation  in  the  inner  product  domain  carj 
be  determined  as  follows.  When  the  input  vector  is  the  exact  version  of  the  stored  vector 
A,  the  inner  products  with  all  of  the  stored  vectors  will  be  (8,  4,  4,  4),  thus  presenting 
an  average  activity  level  of  5.  Since  the  cross-talk  is  known  to  be  4,  the  threshold 
can  be  set  at  0.3  times  the  average  activity  value.  The  saturation  level  can  be  set 
at  8,  i.e.  1.6  times  the  average  activity  value.  In  the  data  plane,  on  the  other  hand| 
the  average  activity  value  for  one  of  the  four  stored  vectors  is  0.5.  We  can  set  the 
threshold  of  the  nonlinear  transformation  to  be  the  average  activity  value.  Computer 
simulations  were  performed  with  these  parameter  settings.  Even  when  the  input  vectors 
could  contain  bit  dropouts  as  well  as  increased  cross-talk,  the  attentive  associative 
memory  was  shown  to  extract  the  stored  vector  that  the  input  is  closest  to  in  the  sense 
of  hv/ir,-;:  *■>»  largest  inner  product.  We  show  two  examples  below  from  the  computer 


EXAMPLE  I 


INPUT  VECTOR 


RETRIEVED  VECTOR 


EXAMPLE  II 


INPUT  VECTOR 


RETRIEVED  VECTOR 


1101000001100000 


1111000011110Q00A 


1010001011101100 
1  0  1  0  1  0  1  0  1  0  1  0  1  0  1  0  B 


In  the  first  example,  three  bits  were  missing  from  the  vector  A,  indicated  by  the 
underlined  ’O'.  The  attentive  associative  memory  successfully  retrieved  the  full  vector 
A  in  one  iteration.  The  second  example  presented  a  more  complidated  scenerio.  The  input 
vector  has  an  inner  product  of  6  with  vector  B  and  an  inner  product  of  5  with  the  othef 
throe  vectors.  Thus  a  combination  of  cross-talk  and  bit-dropout  reduced  the  auto-  t6 
cross-correlation  ratio  even  further.  The  attentive  associative  memory  correctly  retrieved 
vector  B  after  three  iterations  as  the  stored  vector  closest  to  the  input  vector.  Anothei 
example  was  designed  to  demonstrate  the  ability  of  the  system  to  give  a  higher  weight 
to  a  chosen  vector,  thus  retrieving  it  preferentially  over  the  other  vectors.  The  input 
vector  was  chosen  to  have  equal  value  for  the  inner  product  with  vector  A  and  vector 
C.  The  transfer  function  for  channel  A  was,  however,  modified  by  increasing  its  slope 
by  a  factor  of  2  compared  to  the  other  channels.  The  attentive  associative  memory  theft 
retrieved  A.  i 


EXAMPLE  III 


INPUT  VECTOR 


101100011001110  o' 


RETRIEVED  VECTOR  1111000011110000A  ! 

Without  attention,  the  system  can  be  made  to  settle  into  a  null  state,  indicating  the 
input  was  ambiguous,  or  it  can  be  made  to  settle  into  a  partial  pattern  that  is  an  overlap 
between  vector  A  and  vector  C.  The  particular  behavior  is  determined  by  the  choice  for 
ether  parameters  for  the  nonlinear  transformations.  | 

| 

Ootical  implementation 

* 

M 

* 

Figure  1  indicated  that  the  ittentive  associative  memory  contains  two  vector-matrix 
multipliers  connected  in  a  loop  with  nonlinearities  in  between  them.  The  matrix  in  botfi 
the  multipliers  was,  however,  identical.  This  fact  can  be  exploited  to  design  an  optical 
attentive  associative  memory  with  bi-directional  propogation  of  light  and  a  common  matrix  \ 
mask.  A  compact  structure  can  be  realized  by  using  long  finger-like  modulators  and 
detectors  to  perform  the  operation  of  broadcasting  and  summing,  respectively,  that  are 
required  in  vector-matrix  multipliers.  The  schematic  diagram  of  such  a  compact!  j 
architecture  is  shown  in  Figure  3.  The  active  part  of  the  system  consists  of  an  ), 
optoelectronic  panel  containing  pairs  of  detectors  and  light  modulators  that  are 
o Loot r i ca i ly  connected  to  each  other  through  an  amplifier  and  nonlinear  circuit.  Th<| 
current  out  of  the  detector  stripe  is  proportional  to  the  sum  of  the  light  distribution  { 
on  it.  That  signal  is  amplified  and  processed  before  applying  it  to  the  light  modulaton  k 
strips,  which  then  broadcasts  it  to  its  entire  length  uniformly.  The  system  shown  irf  * 
figure  3  is  designed  to  store  three  vectors,  each  with  four  elements.  Thus  the  inputl 
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panel  contains  four  pairs  of  detector-modulator  stripes,  each  three-elements  long.  The 
other  panel  is  placed  in  the  correlation  domain  and  contains  th^ee  pairs  of 
detector-modulator  stripes,  each  four-elements  long,  that  are  orthogonallj?'-' oriented  •  with 
respect  to  the  input  panel  stripes.  The  matrix  mask  sandwiched  between  the  two  panels 
contains  the  three  four-element  vectors.  The  initial  vector  can  be  applied  to  the  input, 
panel  via  an  optical  signal  to  the  photodetector  or  via  an  electrical  signal  to  the  : 
modulator.  This  vector  is  then  broadcast  to  all  of  the  vectors  of  the  matrix  mask  via  ; 
the  stripe  modulator.  The  transmitted  light  contains  the  element-by-element  multiplication) 
between  the  initial  vector  and  all  the  stored  vectors.  The  stripe  detector  in  the  j 

correlation  domain  now  sums  the  products  along  a  row  thus  performing  a  vector-vector! 
inner  product.  These  inner  product  (correlation)  results  are  nonlinearly  amplified  and; 
applied  to  the  modulators,  which  broadcast  them  to  the  corresponding  row  vector  in  the j 
matrix  mask.  The  backward  propagating  light  now  performs  a  scalar-vector  multiplication) 
per  channel.  The  detectors  in  the  input  pannel  now  perform  a  weighted  sum  of  all  the’ 
stored  vectors,  thus  calculating  a  new  estimate  of  the  initial  vector.  The  detector) 
outputs  are  nonlinearly  amplified  before  driving  the  modulator  stripes,  at  which  point j 
the  cycle  repeats. 

• 

The  system  shown  in  Figure  3  can  be  simulated  with  discrete  off-the-shelf  components,) 
sue!)  as  LED’s  and  photodetectors.  The  schematic  diagram  of  the  input  panel  consisting 
of  LED’s  and  photodetectors  is  shown  in  Figure  4.  Three  discrete  photodetectors,  connected 
in  parallel,  replace  the  stripe  detector  and  three  LED’s  connected  in  series  replace 
the  stripe  modulator.  An  electronic  amplifier  module  per  channel  implements  the  desired 
nonlinear  amplification  of  the  signal.  An  optoelectronic  testbed  capable  of  storing 
four  16-bit  vectors  was  fabricated.  Figure  5  shows  the  photograph  of  the  finished  unit. 
The  initial  vectors  can  be  input  via  16  potentiometers  on  the  front  panel.  The  offset! 
and  gain  in  the  correlation  domain  for  each  of  the  four  stored  vectors  can  also  be| 
controlled  via  8  potentiometers  on  the  front  panel.  One  control  adjusts  the  threshold! 
level  of  the  hardclipping  operation  in  the  input  panel.  A  film  mask  was  prepared  encoding! 
the  vectors  shown  in  equation  [4].  The  operation  of  this  unit  was  tested  and  results! 
consistent  with  the  computer  simulations  were  obtained.  ’ 

1 

Applications  of  attentive  associative  networks  j 

The  previous  sections  described  a  model  of  an  attentive  associative  network  that  wasij 
used  in  storing  data  vectors  and  retrieving  them  from  incomplete  and/or  noisy  versions! 
of  the  same  vectors.  This  is  only  one  particular  application  for  the  general  concept! 
of  attentive  associative  network.  The  main  idea  behind  the  attentive  associative  network^ 
can  be  described  by  the  following  steps:  ’  \ 

4 

(1)  Project  the  input  vector  in  the  space  spanned  by  the  first  set  of  data  vectors| 
(possible  input  vectors)  .  J 


Transform  the  projection  values 


the  adaptive  nonlinear  transform  chosen 


,3)  Perform  a  back  pro  je'_ -ion  operation  on  the  space  spanned  by  the  second  set  of* 

data  vectors  (possible  output  vectors)  * 

’  •  .  I 

4)  Transform  the  back  projected  vector  using  another  adaptive  nonlinearity  thatf 

reflects  our  a  priori  knowledge  about  the  output  domain  ■%  f 

•  -  ...  ..  \  ,  8 

(5)  Reverse  the  entire  process  by  first  projecting  the  calculated  output  vector! 

on  the  space  spanned  by  the  possible  output  vectors  and  finally  backpro jecting| 
on  the  space  of  possible  input  vecotrs  * 

(6)  Now  the  estimate  of  the  input  vector  can  be  transformed  to  reflect  our  a  priori; 
knowledge  about  the  incut  vec*-o>- 


T!i  i 3  transformed  estimate  of  tne  input  vector  is 
for  step  one  and  the  entire  operation  is  repeated 


as  the  stax  * 


ooint; 


prior 
5  n  j  u  n  c  t 
it  the 


ult  of  this  iterative  process  is  the  calculation  of  the  appropriate  output 
inding  to  the  input  that  is  the  nearest  neighbor  of  the  given  vector.  Thus,  the;, 
i  knowledge  constraints  in  the  input  as  well  as  the  output  domain  are  used  in 
.ion  with  the  nonlinear  suppression  of  the  cross-talk  in  the  projection  to  carryf 
desired  mapping  between  the  input  and  output  even  when  the  input- is  partial  and/or. 

The  fundamental  operations  described  above  are  common  to  several  application 
In  this  section,  we  discuss  two  particular  application  areas,  namely,  Relational 
:  Architectures  and  Expert  Systems.  The  aim  of  this  discussion  is  to  provide; 
ibil  if/  argument  for  applying  the  attentive  associative  network  concepts  to  thesej 
ii.- :  encourgge  the  design  of  appropriate  optical  architectures  to  handle  these1 
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Relational  database  architectures 

Retrieval  of  information  from  databases  is  a  critical  activity  ift  ■  a  variety  of 
applications.  In  particular,  expert  systems  require  rapid  access  to  their  data  or 
knowledge  bases.  Many  expert  systems  utilize  relational  databases  for  storage  of  knowledge 
and  information.  In  the  following  discussion,  we  describe  the  general  organization  of 
databases  and  suggest  how  optical  attentive  associative  memories  may  be  organized  to 
perform  as  relational  database  machines. 

There  are  several  types  of  database  systems.  The  most  familiar  of  which  are  the 
FILE-MANAGEMENT  systems  that  maintain  a  file  of  records.  Such  systems  provide  their 

users  with  the  ability  to  save  and  retrieve  blocks  of  information  in  files;  Much  like 
card  files,  a  file-management  system  allows  its  user  to  deal  with  unit  blocks  of 
information.  These  files  suffer  from  the  same  drawbacks  as  files  of  cards  and  can  only 
be  used  to  store  and  retrieve  complete  records  at  any  time.  Additionally,  all  records 
are  stored  and  accessible  according  to  one  particular  index.  Advanced  file  management 
systems  incorporate  the  additional  capability  of  sorting  the  records  using  several 
different  keys  or  indeces  and  provide  multiple  access  paths  to  information.  Changing 
the  amount  of  information  that  an  individual  records  holds,  by  adding  new  fields,  usually 
requires  changing  all  the  records  of  the  database.  Retrieval  of  information  requires 
the  use  of  searching,  sorting,  and  selection  procedures.  A  complex  query  to  such  a 
database  may  require  a  complex  assortment  of  searches,  sorts  and  selections  from  retrieved 
information . 

Relational  databases  resemble  file  management  systems  in  that  they  store  records  of 
information  that  are  made  up  of  fields.  However,  information  in  a  relational  database 

may  be  stored  in  several  different  forms  or  different  types  of  records.  Each  of  these 
different  record  types  may  be  thought  of  as  being  contained  in  a  separate  file.  Relational 
databases  allow  the  user  to  apply  relational  algebra  operations  to  these  multiple  record 
structures  in  order  to  create  new  structures  and  files.  The  operations  of  Union, 
Intersection,  Projection,  Cartesian  Product,  Set  Divison  and  Set  complement  can  be  applied 
to  the  database.  The  power  of  relational  databases  is  inherited  from  relational  algebra 
and  its  capability  to  implement  search  and  retrieval  of  information  from  the  database. 
Any  database  query  can  be  reconstructed  as  a  series  of  relational  algebra  operations. 

Relational  set  operations  can  be  developed  from  OR,  AND  and  NOT  operations.  With 
sufficiently  large  and  sensitive  spatial  light  modulators  and  detectors,  an  optical 
implementation  of  a  relational  algebra  machine  could  be  constructed  and  operate  at  high 
speeds  on  a  relational  database.  Union  can  be  performed  as  an  OR  of  two  files. 
Intersection  can  be  performed  as  the  AND  of  the  NOT  of  each  of  two  files,  the  key  element 

in  all  of  these  operations  will  be  the  ability  of  the  attentive  associative  memory  to 

implement  relations.  Figure  6  presents  a  logical  description  of  a  possible  mechanism 
for  computing  the  intersection  of  Relation  A  and  Relation  B.  We  assume  that  attentive 
associtive  memories  can  be  used  to  compute  relations  and  that  components  capable  of 
computing  boolean  AND  are  available.  In  this  system,  the  entire  data  set  is  simultaneously 
passed  through  memories  A  and  B  to  determine  Relation  A  and  Relation  B  and  the  output 
is  ANDed  to  determine  those  points  which  lie  in  the  intersection  of  the  two  relations. 
In  Figure  7,  we  suggest  a  mechanism  for  cascading  relations  by  using  the  output  of  one 
memory  as  input  to  another.  The  first  memory  acts  a  filer  of  selecting  objects  within 

its  relation,  these  output  objects  are  reconstituted  in  an  inverted  memory  and  then  passed 

through  the  memory  implementing  the  second  relation. 

Associative  architectures  in  expert  systems 

The  ideas  developed  for  applying  associative  architectures  to  database  systems  can 
potentially  impact  the  performance  of  expert  systems,  a  well-known  discipline  within 
applied  artificial  intelligence . ®  Expert  systems,  as  the  name  implies,  seek  to  emulate 
human  expertise  in  specialized  areas,  known  in  AI  parlance  as  domains  and  embed  that 

within  the  memory  or  knowledge  base  of  computing  systems.  Examples  include  the 

interpretation  of  spectral  data  (the  DENDRAL  project),  the  configuration  of  minicomputers 
from  components  (the  R1  project),  and  the  diagnosis  of  diseases  in  internal  medicine 
(the  MYCIN  project).  These  computer  systems  "achieve  high  levels  of  performance  in  task 
areas  that,  for  human  beings,  require  years  of  special  education  and  training." 

Obviously,  the  amount  of  knowledge  that  must  be  stored  in  order  to  be;  successful  in 
this  endeavor  is  very  great,  and  this  places  severe  demands  upon  both  the  organization 
of  memory  and  the  recall  processing  these  systems.  Not  only  do  these  data  bases  contain 
the  collection  of  facts  that  are  relevant  to  the  problem  domain  of  a  particular  system, 
but  they  contain  rules  that  enable  the  intelligent  manipulation  of  the  facts.  Two  common 
techniques  exist  for  retrieval:  The  first  uses  multiple  indices  .to  associat i ve ly  recall 

memory  elements.  The  other  major  type  of  retrieval  is  based  on  pattern  matching,  where 
data  l retrieved  according  to  some  pattern  which  is  related  to  data  categories. 
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It  is  important  to  note  that  these  retrieval  schemes  differ  significantly  from  those 
used  in  numeric  computers  which  store  data  according  to  memory  addresses.  In  AI  systems, 
no  one  knowledge  element  exactly  matches  the  desired  retrieval  state,  so  that  very  often 
there  are  several  candidates  for  recall.  A  system  that  can  look  for  the  closest  match, 
especially  w!.  ->  dealing  with  incomplete  data  and  references,  would  reduce  delays  associated 
with  this  process. 

In  any  discussion  of  retrieval,  one  must  go  far  beyond  the  fundamental  techniques 
for  recognizing  relevant  data  in  the  knowledge  base  to  a  discussion  of  how  one  searches 
for  the  set  or  sets  of  data  that  can  lead  to  a  defined  goal.  This  is  clearly  beyond 
the  scope  of  this  paper,  but  is  discussed  thoroughly  in  Ref.  8.  However,  many  expert 
systems,  and  most  AI  systems  in  general,  use  the  LISP  ( LISt  £rocessing)  language  to  develop 
their  programs.  One  feature  of  this  language  is  its  ability  to  process  information 
recursively,  and  hence  generate  an  expectation  for  an  element  to  be  retrieved  from  memory. 
This  expectation,  which  is  derived  from  the  state  of  the  machine  at  the  specific  point 
in  time,  is  clearly  analogous  to  the  concept  of  "attention"  in  the  associative  architecture 
discussed  so  far. 
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Finally,  it  is  the  combination  of  the  searching  of  the  knowledge  base  and  the  retrieval 
techniques  discussed  above  that  form  the  major  computational  bottleneck  in  expert  systems. 
The  searching  over  extensive  knowledge  bases  is  typically  reduced  by  using  knowledge 
about  the  .systems  data  base  to  "prune"  the  search  process.  However,  for  each  element 
searched,  an  association  or  matching  operation  must  be  performed,  causing  the  problem 
to  worsen  as  knowledge  base  size  scales.  Memory  organizations  which  can  effectively 
delimit  the  size  of  the  search  space  and  can  efficiently  match  patterns,  such  as  the 
attentive  associative  memory,  should  help  alleviate  this  bottleneck. 
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Figure  4.  The  schematic  diagram  u t  the  components  of  an  optoelectronic  testbed  simulating 
the  design  shown  in  Figure  3  with  discrete  components. 
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Figure  a.  The  photographs  of  an  optoelectronic  testbed  demonstrating  an  optical  attentive 
associative  memory. 
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Figure  6.  Scrematic  diagram  of  a  system  for  computing  an  intersection  between  relation 
A  and  relation  B. 
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Figure  7.  Schematic  diagram  of  a  system  for  cascading  relations  A  and  B. 
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ABSTRACT 

The  operation  of  multiplying  two 
matrices  is  one  of  the  fundamental 
operations  that  is  frequently  encountered 
in  signal  processing,  image  processing 
and  numerical  computations.  This 
operation  can  be  performed  by  optical 
systems  employing  parallelism.  This  paper 
discusses  different  architectures  for 
optical  processors  performing  this 
operation.  The  architectures  are  divided 
into  several  classes  according  to  the 
degree  of  parallelism  entailed  and  the 
type  of  interconnections  employed. 


INTRODUCTION 

The  operation  of  multiplying  two 
matrices  is  one  of  the  most  common 
operations  encountered  in  signal 

processing,  image  processing,  and 

numerical  computation  involving  solutions 
of  a  system  of  linear  simultaneous 
equations.  This  operation  is 

mathematically  defined  in  equation  1 
below: 

Ci  j  =  Ai  k  Bk  j  C 1  ] 

k 

For  i=l  tc  N  and  J=1  to  N 

In  FQRTRTAN  ,  this  operation  can  be 
coded  by  the  following  program'- 

DIMENSION  A( N , N ) ,  B(N,N),  C(N,N) 

DO  10  I  =  1 , N 

DO  10  J  =  1 ,  N 

DO  10  K.=  1 ,  N 

C(I,J)  =  C(I,J)  +  A ( 1 , K )  *  B(K, J) 

10  CONTINUE 

It  should  be  noted  that  there  are 
three  DO  loops  in  the  program  with  N 
passes  per  loop.  Since  the  operations 
performed  in  the  loops  aro  multiplication 
and  addition,  this  program  involves  N3 
multiplications  and  additions.  The 

indices  I  and  J  orrespond  to  the  array 
index  of  the  output  matrix  C,  and  the 
index  K  is  the  common  array  index  for  the 


input  matrices  A  and  B,  over  which 
summation  is  performed 

Optical  processing  systems  are 
capable  of  performing  the  operations  of 
analog  multiplication  and  addition  in 
parallel  between  one-  or  two-dimensional 
array  of  positive  real  numbers  and 
achieve  global  communication  between 
them.  This  property  of  optics  has  been 
used  in  the  past  for  signal  processing 
and  image  processing  operations  primarily 
based  on  Fourier  transforms.  Recently, 
optical  processing  systems  have  been 
investigated  for  performing  matrix 
operations  in  parallel,  which  will  make 
them  more  widely  applicable.  The  large 
variety  of  optical  architectures  reported 
in  the  literature  can  be  classified  into 
three  different  categories  according  to 
the  level  of  parallelism: 

[i)  one-dimensional  systems 
performing  N  operations  in  parallel 

(ii)  two-dimensional  systems 
performing  N*  operations  in  parallel 

(ill)  two-dimensional  systems 
employing  multiplexing  performing  N* 
operations  in  parallel. 

The  first  two  categories  can  be 
further  subdivided  according  to  which  of 
the  DO  loops  in  the  program  for  matrix 
multiplication  are  implemented  in 
parallel.  The  optical  processors  in  each 
subcategory  can  be  implemented  with 
different  technologies,  such  as 
integrated  optics,  acoustooptics, 
electrooptics  etc.  A  further 
classification  results  when  one  considers 
which  parameters  -  eg  time  or  space, 
temporal  or  spatial  frequency  -  are  used 
to  multiplex  the  operations  and  which  aro 
used  for  summation/integration.  A  truely 
comprehensive  study  which  includes  a 
detailed  analysis  of  ail  of  these 
variations  will  indeed  be  massive.  In 
this  paper  we  will  briefly  discuss  the 
different  optical  architectures  in  a 
generic  way.  We  will  place  particular 
emphasis  on  the  types  interconnects 
involved  in  each  of  these  archietctures 
since  it  will  high  light  the  special 
advantages  offered  by  tho  use  of  optics . 
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Other  details  can  be  found  in  the 
articles  that  are  referenced  at  the  end 
of  ~he  paper. 


The  optical  processor  in  this 
category  Ferform  N  multiplications  and/or 
additions  in  parallel.  If  we  look  at  the 
program  listing,  we  will  get  three 
choices  for  the  DO  loop  index  that  we 
could  implement  in  parallel  with  an 
optical  system.  The  indices  I  and  J 
correspond  tc  the  array  indices  of  the 
output  matrix  C,  and  hence  are  equivalent 
for  the  purpose  of  this  analysis.  Hence 
there  are  two  choices,  ( i )parallelizing 
the  DO  locp  over  index  I  (  or  3  )  ,  and 
(  ii  )parallelizing  the  DO  loop  over  index 
K . 


(i)  This  choice  of  the  DO  loop 
index  (I)  leads  to  optically  performing  N 
multiplication  in  parallel  and  the 
summation  over  the  K  index  and  the  DO 
loop  over  J  sequentially .  The  resulting 
optical  architecture  is  shown  in  Figure 
1.  This  system  contains  a  one-dimensional 
(1-D1  spatial  light  modulator  (SLM),  a 
point  modulator,  a  1-D  time-integrating 
detector  array,  and  collimating  optics  ( 
for  signal  broadcasting  )  and  imaging 
optics.  The  basic  operation  that  this 
processor  performs  in  one  clock  cycle  of 
the  1-D  SLM  is  that  of  scalar-vector 
multiplication  defined  by  equation  2- 


It  should  be  noted  that  this  system  uses 
an  interconnection  network  for 

broadcasting  in  one  dimension. 

(ii)  This  choice  of  the  DO  loop 
index  (K)  leads  to  performin  fi 
multiplications  and  additions  in 

parallel.  The  resulting  architecture  is 
shown  in  Figure  2.  This  system  contains 
two  1-D  SLM,  a  point  aetector,  and 
focussing  optics  (  for  fan-in  )  and 
imaging  optics.  The  basic  operation  that 
this  processor  performs  in  one  clock 
cycle  of  the  1-D  SLM  is  that  of  vector 
inner  product  defined  by  equation  3: 


1 i  should  be  noted  that  this  system  uses 
an  interconnection  network  for  fanning  in 
,1-duta  channels  in  one  dimension. 

Both  of  these  optical  architectures 
do  r.ot  fully  exploit  the  2-D  parallelism 
of  optics.  However,  their  1-D  nature 


allows  for  an  integrated  optic 
implementation  that  allows  for  higher 
speed  operation  (  100  MHc  )  in  a  small 

and  rugged  package.  It  should  also  be 
noted  that  the  operations  described  above 
are  of  interest  in  and  by  themselves  for 
signal  processing/symbolic  processing 
operations  and  do  not  have  to  be 
considered  in  the  context  of  matrix 
multiplications  alone.  The  scalar-vector 
multiplication  is  is  useful  in 
calculating  a  weighted  version  of  a  given 
input  and  the  vector-vector  inner  product 
can  give  a  similarity  measure  between  the 
two  vectors  to  be  compared. 

The  optical  processors  in  this 
category  perform  N*  multiplications 
and/or  additions  in  parallel  and  hence 
fully  exploit  the  2-D  parallelism  offered 
by  optics  If  we  look  at  the  program 
listing,  we  will  get  two  distinct  choices 
for  the  two  DO  loops  that  we  can 
implement  in  parallel  with  an  optical 
system.  (i)  this  choice  involves 
performing  the  DO  loops  over  I  and  J 
index  in  parallel,  (ii)  the  second  choice 
involves  performing  the  DO  loops  over  I 
(J)  and  K  in  parallel. 

(i)  This  choice  of  DO  loop  indices 
(  I  and  J  )  leads  to  performing  N* 
multiplications  in  parallel  and  the 
summation  over  the  K  index  sequentially. 
The  resulting  optical  architecture  is 
shown  in  Figure  3.  This  system  contains 
two  1-D  SLMs  arranges  orthogonal  to  each 
other,  optics  for  collimating  and 
focussing  simultaneously  along  orthogonal 
directions,  and  a  2-D  time-integratin 
detector  array.  The  basic  operation  tha 
this  processor  performs  in  one  cloc 
cycle  of  the  1-D  SLM  i3  that  of  a 
vector-vector  outer  product  defined  in 
equation  [ 4 ] : 


where  C’ is  a  rank  one  matrix  that  is  NXH 
in  dimension.  It  should  be  noted  that 
this  optical  architecture  use3  two  1-D 
input  arrays  and  yet.  calculates  a  2-D 
output  array.  The  interconnections 
utilized  by  this  system  are  quite  complex 
in  that  they  involve  broadcasting  along 
one  spatial  dimension  and  fanning- in 
along  orthogonal  spatiai  direction.  One 
element  of  the  1-D  SLM  encoding  one 
element  of  the  vector  b  is  simultaneously 
accessed  by  N  elements  of  the  other  input 
vector  a  without  contention.  The  data 
path  for  an  element  of  vector  s.  retains 
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its  identity.  So  after  being  multiplied 
by  an  element  of  vector  b,  it  is  directed 
to  a  specific  location  on  the  2-D  time- 
integrating  detector  array  as  shown  in 
Figure  3.  This  system  can  be  considered 
to  be  a  spatially  multiplexed  version  of 
the  1-D  scalar-vector  multiplier  depicted 
in  Figure  1. 

(ii)  This  choice  of  DO  loop  indices 
(  I  and  K  )  leads  to  performing 
^multiplications  and  additions  in 
parallel  and  performing  the  DO  loop  over 
the  remaining  index  (  J  )  sequentially. 
The  resulting  optical  architecture  is 
shown  in  Figure  4.  This  system  contains 
one  1-D  SLM,  one  2-D  SLM,  one  1-D 
non-integrating  detector  array,  and 
optics  for  imaging/collimating  on  the 
input  side  and  for  imaging/focussing  on 
the  output  side.  The  basic  operation  that 
this  processor  performs  is  that 
multiplying  a  column  vector  of  matrix  B 
by  matrix  A  defined  in  equation  [5]: 

£  =  A  fe  [5] 

The  interconnections  utilized  by  this 
system  involve  broadcasting  the  vector  b 
to  all  rows  of  matrix  A  on  the  input  side 
and  fanning  in  the  product  of  a  row  of  A 
and  vector  b  on  to  a  specific  detector 
element  on  the  output  side.  Thi3  system 
can  be  considered  to  be  a  spatially 
multiplexed  version  of  the  1-D 
vector-vector  inne*-  produc*  p'rccessw- 
depicted  in  Figure  2. 

The  optical  processors  in  this 
category  do  utilize  the  2-D  parallelism 
of  optical  systems  by  employing  the 
spatial  variables  for  multiplexing  and/or 
for  integration.  These  two  operations  are 
also  useful  in  and  by  themselves  in 
signal  and  image  processing.  The 
operation  of  outer  product  is  critical  in 
synthesizing  a  complicated  matrix  (  e.g. 
an  image  )  from  several  simple 
''primitive'’  images  corresponding  to  rank 
one  matrices.  The  vector-matrix 

multiplication  implements  a  generalized 
linear  transformation  on  a  1-D  input.  The 
optics  utilized  by  these  systems  contains 
off-the-shelf  components  like  spherical 
and  cylindrical  lenses  to  give  different 
properties  along  the  two  orthogonal 
d  reel  ion  s  . 

N3  -  PARA  L  L  F.  L,  _Q  P.T I£A  L. _  SXSJTEMl 

The  optical  processors  in  thi3 
category  perform  N3  multiplications  and 
additions  in  parallel.  Since  the 
trix-matrix  multiplication  involves  N3 


-  •  i 

i 

multiplications/additions  this  class  of  | 
optical  processors  will  perform  the  f 
operation  in  one  clock  cycle  of  the  | 
active  devices  involved.  Since  we  are  j 
performing  all  the  DO  loops  in  parallel,  t 
there  is  only  one  way  of  formulating  the  j 
processor  mathematically.  Different  | 

architectures  result,  however,  when  j 
designing  such  a  system  optically.  * 

I 

The  schematic  diagram  of  the  | 
N3 -parallel  optical  system  is  shown  in  ] 
Figure  5.  This  system  uses  two  2-D  SLMs  j 
for  inputing  matrices  A  and  B  and  a  2-D  J 
detector  array for  detecting  the  output  i 
matrix  G.  The  optics  involved  is 
complicated  and  has  to  be  implemented  via  | 
computer  generated  holography.  i 

One  • intriguing  feature  of  this  f 
architecture  is  that  it  employs  only  N2  , 
active  elements  and  still  performs  N3  • 
operations  in  parallel.  Therefore  the  f 
efficiency  of  thi3  architecture  as  a  * 

parallel  processor  (computational  speedup  | 
/  number  of  processing  elements)  .is  N  * 

which  can  be  much  larger  than  1 !  In  mo3t 
other  designs  for  parallel  processors  the  i 
goal  is  to  achieve  an  efficiency  of  1, 
indicating  a  linear  speedup  with  . the  * 

number  of  processors.  This  indeed  J 

represents  a  unique  feature  of  optics  in  ; 
that  it  utilizes  the  parallel,  contention  j 
free,  multiple-access  communication  ‘ 

capability  of  optics  to  add  an  extra  J 

dimension  to  the  computational  power  of  a  j 
p,-_-a.lle  processor.  Since  the  operation  of  , 
matrix-matrix  multiplication  is  a  well  J 
structured  operation  with  little  * 

interdependance  between  the  calculations  » 
of  the  elements  __  of  the  output  matrix,  I 
this  feature  of  optical  systems  can  be  | 
exploited  to  fullest  extent  to  achieve  f 
superlinear  speedup  in  a  parallel  f 

architecture.  | 

| 

The  N3 -parallel  optical  processor  ; 
can  be  viewed  as  a  multiplexed  version  of  * 
the  N2 -parallel  optical  architectures.  | 
Since  in  the  earlier  section  we  discussed  | 
two  different  optical  systems  for  those  | 
architectures,  we  can  view  the  | 
N3 -parallel  architecture  as  multiplexed  jj 
versions  of  either  the  outer  product  . 
optical  processor  or  the  vector-matrix  * 
optical  processor.  It  should  be  < 
emphasized  that  this  division  is  strictly  ^ 
for  conceptual  clarity  and  is  not  < 
indicative  of  a  deeper  category. 

i 

Figure  6(a)  shows  the  schematic 
diagram  of  an  N3 -parallel  optical  j 
processor  viewed  as  N  outer  product  I 
optical  processors  that  are  spatially  j 
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multiplexed.  In  the  system  depicted  in 
Figure  3  the  addition  of  the  outer 
product  matrices  for  calculating  the 
output  matrix  C  was  performed  by 
integration  in  time  on  the  2-D  detector 
array.  In  Figure  6(a),  all  the  outer 
products  are  simultaneously  calculated 
and  are  therefore  added  by  integration  of 
N  terms  in  space  by  fanning  in  the 
apropriate  signals  emerging  from  the 
elements  of  the  second  matrix  B.  Each 
column  of  matrix  A  is  encoded  by  a  light 
b**am  traveling  at  an  apropriate  angle  so 
as  to  interact  with  the  correct  row  of 
matrix  B  as  indicated  in  Figure  3.  On  the 
output  detector  array,  the  results  of  the 
different  outer  products  performed  in 
parallel  converge  thus  performing  the 
final  integration.  Figure  6(a)  shows  the 
optical  paths  for  only  two  outer  products 
for  clarity. 

Figure  6(b)  shows  the  schematic 
diagram  of  the  N3 -parallel  optical 
processor  viewed  as  a  spatially 
multiplexed  version  of  the  optical 
vector-matrix  multiplier  shown  in  Figure 
4.  In  that  system,  the  operation  of 
vector-matrix  multiplication  was 

performed  in  one  cycle  generating  one  row 
of  the  output  matrix  C.  The  full  answer 
was  calculated  by  generating  different 
rows  of  matrix  C  in  a  time-sequential 
fashion.  In  the  system  depicted  in  Figure 
6(b),  all  rows  of  matrix  A  are  available 
simultaneously  while  the  matrix  B  is  used 
in  N  different  vector-matrix  products 
simultaneously.  The  resultant  N  rows  of 
the  output  matrix  C  are  spatially 
separated  and  are  detected  by  the  rows  of 
the  output  detector  array.  In  this 
processor,  each  row  of  matrix  A  is 
encoded  by  a  light  beam  traveling  at  an 
appropriate  angle.  At  matrix  B,  all  rows 
of  A  are  simultaneously  accesing  the 
elements  of  B  while  keeping  their 
distinct  identity.  The  unique  encoding  of 
the  light  beam  for  each  row  of  A  causes 
the  output  to  be  separated  on  the 
detector  array  thus  providing  all  rows  of 
matrix  C  in  parallel. 

CONCLUSION 

Optical  processors  offer  the  unique 
features  of  two-dimensional  parallelism 
in  the  basic  arithmetic  operations  of 
analog  multiplications  and  additions  and 
g :  parallel,  and  contention-free 

communication  between  the  two-dimensional 
array  of  simple  processing  elements 
These  features  can  be  exploited  to  build 
ootiial  processors  for  performing  the 
general  operation  of  matrix-matrix 


multiplication.  In  thi#  paper,  we] 
outlined  several  different  architecture*? 
of  parallel  optical  processors  foij 
performing  thi3  operation  with  varying 
degree  of  parallelism.  In  each  categorjJ 
the  type  of  communication  involved  were 
emphasized.  A  unique  optical  architecture 
was  described  that  uses  the 
contention-free  communication  offered  by 
optics  to  obtain  a  speedup  of  N3  with  a 
system  containing  only  N2  active 
processing  elements.  I 
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Figure  1  The  N-oarallel  scalar-vector  product 
optical  processor  (Imaging  optica  omrriiited.; 
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Figure  2.  The  N-parallel  inner  product  optical 
processor.  (Imaging  optics  ommitted.) 
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Figure  3.  The  N  -parallel  outer  product  optical 
processor. 
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Figure  4.  The  N2-parallel  vector-matrix  optical 
processor 
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Figuie  5.  The  N3-paralle!  optical  processor. 
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Figure  6(a).  The  N-parallel  optical 

processor  viewed  as  a  multiplexed  outer 
product  processor. 
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